Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Ryan Moeller	7ca0ef36e1	FreeBSD: Fix scope of deadman tunables A few deadman tunables ended up in the wrong sysctl node. Move them to vfs.zfs.deadman.* Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11715	2021-06-23 13:22:14 -07:00
Christian Schwarz	abb485a34a	zvol: call zil_replaying() during replay zil_replaying(zil, tx) has the side-effect of informing the ZIL that an entry has been replayed in the (still open) tx. The ZIL uses that information to record the replay progress in the ZIL header when that tx's txg syncs. ZPL log entries are not idempotent and logically dependent and thus calling zil_replaying() is necessary for correctness. For ZVOLs the question of correctness is more nuanced: ZVOL logs only TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical dependencies between two records exist only if the write or discard request had sync semantics or if the ranges affected by the records overlap. Thus, at a first glance, it would be correct to restart replay from the beginning if we crash before replay completes. But this does not address the following scenario: Assume one log record per LWB. The chain on disk is HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z") where N:W(O, C) represents log entry number N which is a TX_WRITE of C to offset A. We replay 1, 2 and 3 in one txg, sync that txg, then crash. Bit flips corrupt 2, 3, and 4. We come up again and restart replay from the beginning because we did not call zil_replaying() during replay. We replay 1 again, then interpret 2's invalid checksum as the end of the ZIL chain and call replay done. The replayed zvol content is "AX". If we had called zil_replaying() the HDR would have pointed to 3 and our resumed replay would not have replayed anything because 3 was corrupted, resulting in zvol content "BX". If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's contents. This patch adds the zil_replaying() calls to the replay functions. Since the callbacks in the replay function need the zilog_t* pointer so that they can call zil_replaying() we open the ZIL while replaying in zvol_create_minor(). We also verify that replay has been done when on-demand-opening the ZIL on the first modifying bio. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #11667	2021-06-23 13:22:14 -07:00
nssrikanth	26cb87d22d	Cancel TRIM / initialize on FAULTED non-writeable vdevs When a device which is actively trimming or initializing becomes FAULTED, and therefore no longer writable, cancel the active TRIM or initialization. When the device is merely taken offline with `zpool offline` then stop the operation but do not cancel it. When the device is brought back online the operation will be resumed if possible. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com> Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> Closes #11588	2021-06-23 13:22:14 -07:00
Brian Behlendorf	395583e38e	Fix overly broad locking in spa_vdev_config_exit() Calling vdev_free() only requires the we acquire the spa config SCL_STATE_ALL locks, not the SCL_ALL locks. In particular, we need need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a writer since this can lead to a deadlock. The txg_sync_thread() may block in spa_txg_history_init_io() when taking the SCL_CONFIG lock as a reading when it detects there's a pending writer. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11585	2021-06-23 13:22:14 -07:00
Ryan Moeller	818bc70c32	Wrap bare EINVAL returns with SET_ERROR Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11636	2021-06-23 13:22:14 -07:00
Ryan Moeller	cda6fdd500	Intentionally allow ZFS_READONLY in zfs_write ZFS_READONLY represents the "DOS R/O" attribute. When that flag is set, we should behave as if write access were not granted by anything in the ACL. In particular: We _must_ allow writes after opening the file r/w, then setting the DOS R/O attribute, and writing some more. (Similar to how you can write after fchmod(fd, 0444).) Restore these semantics which were lost on FreeBSD when refactoring zfs_write. To my knowledge Linux does not actually expose this flag, but we'll need it to eventually so I've added the supporting checks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11693	2021-03-08 09:07:29 -08:00
Brian Behlendorf	e219935f10	Initialize ZIL buffers When populating a ZIL destination buffer ensure it is always zeroed before its contents are constructed. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tom Caputi <caputit1@tcnj.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11687	2021-03-08 09:07:21 -08:00
Jake Howard	e93203e004	Add "zstd-fast" to help options for "compression" property This value does work as expected, and is documented in the manpage. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jake Howard <git@theorangeone.net> Closes #11670	2021-03-05 12:58:47 -08:00
Andriy Gapon	ccb453acd0	Fix assert in FreeBSD-specific dmu_read_pages The function has three similar pieces of code: for read-behind pages, requested pages and read-ahead pages. All three pieces had an assert to ensure that the page is not mapped. Later the assert was relaxed to require that the page is not mapped for writing. But that was done in two places out of three. This change fixes the third piece, read-ahead. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andriy Gapon <avg@FreeBSD.org> Closes #11654	2021-03-05 12:58:08 -08:00
Coleman Kane	a0eb5a77a0	Linux 5.12 compat: bio->bi_disk member moved The struct bio member bi_disk was moved underneath a new member named bi_bdev. So all attempts to reference bio->bi_disk need to now become bio->bi_bdev->bd_disk. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11639	2021-03-05 12:57:46 -08:00
Brian Behlendorf	fe77c48320	Linux: increase max nvlist_src size On Linux increase the maximum allowed size of the src nvlist which can be passed to the /dev/zfs ioctl. Originally, this was set to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd. Since that time it's been converted to a vmalloc so that's no longer a hard limit, and it's desirable for `zfs send/recv` to allow larger nvlists so more snapshots can be sent at once. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6572 Closes #11638	2021-03-05 12:54:57 -08:00
fbynite	4d4dd76f0f	vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL This prevents a panic after a SLOG add/removal on the root pool followed by a zpool scrub. When a SLOG is removed, a hole takes its place - the vdev_ops for a hole is vdev_hole_ops, which defines the handler functions of vdev_op_hold and vdev_op_rele as NULL. This bug has been reported in illumos and FreeBSD, a different trigger in the FreeBSD report though. Credit for this patch goes to Patrick Mooney <pmooney@pfmooney.com> Obtained from: illumos-gate commit: c65bd18728f34725 External-issue: https://www.illumos.org/issues/12981 External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Wing <rob.fx907@gmail.com> Closes #11623	2021-03-05 12:53:50 -08:00
Ryan Moeller	403703d57a	Restore FreeBSD resource usage accounting Add zfs_racct_* interfaces for platform-dependent read/write accounting. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11613	2021-03-05 12:50:32 -08:00
Mark Johnston	f17c843eff	FreeBSD: disable the use of hardware crypto offload drivers for now First, the crypto request completion handler contains a bug in that it fails to reset fs_done correctly after the request is completed. This is only a problem for asynchronous drivers. Second, some hardware drivers have input constraints which ZFS does not satisfy. For instance, ccp(4) apparently requires the AAD length for AES-GCM to be a multiple of the cipher block size, and with qat(4) the AES-GCM AAD length may not be longer than 240 bytes. FreeBSD's generic crypto framework doesn't have a mechanism to automatically fall back to a software implementation if a hardware driver cannot process a request, and ZFS does not tolerate such errors. The plan is to implement such a fallback mechanism, but with FreeBSD 13.0 approaching we should simply disable the use hardware drivers for now. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #11612	2021-03-05 12:49:41 -08:00
Antonio Russo	8829ba19b7	Set file mode during zfs_write `3d40b65` refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. `3d40b65` accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #11474 Closes #11576	2021-02-08 09:20:38 -08:00
George Amanakis	a1a1386965	Avoid updating the L2ARC device header unnecessarily If we do not write any buffers to the cache device and the evict hand has not advanced do not update the cache device header. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #11522 Closes #11537	2021-01-28 11:39:24 -08:00
Paul Dagnelie	43eaef6de8	Fix zrele race in zrele_async that can cause hang There is a race condition in zfs_zrele_async when we are checking if we would be the one to evict an inode. This can lead to a txg sync deadlock. Instead of calling into iput directly, we attempt to perform the atomic decrement ourselves, unless that would set the i_count value to zero. In that case, we dispatch a call to iput to run later, to prevent a deadlock from occurring. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11527 Closes #11530	2021-01-28 11:39:13 -08:00
Colm	0f3b928e85	Fix two minor lint errors (cppcheck) Fix two minor errors reported by cppcheck: In module/zfs/abd.c (abd_get_offset_impl), add non-NULL assertion to prevent NULL dereference warning. In module/zfs/arc.c (l2arc_write_buffers), change 'try' variable to 'pass' to avoid C++ reserved word. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Colm Buckley <colm@tuatha.org> Closes #11507	2021-01-24 16:06:33 -08:00
Alexander Motin	12fec4a147	Relax special_small_blocks assertion. Follow up for commit `624222a`, value asserted <= SPA_OLD_MAXBLOCKSIZE instead of SPA_MAXBLOCKSIZE as it should be after the previous change. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #11501	2021-01-24 16:06:16 -08:00
Ryan Moeller	7930a5ee65	FreeBSD: upstream changes to VFS interface Set VIRF_MOUNTPOINT flag on snapshot mountpoint. Authored-by: Mateusz Guzik <mjg@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11458	2021-01-24 16:06:02 -08:00
Matthew Ahrens	4e1d1b4b92	assertion failed in arc_wait_for_eviction() If the system is very low on memory (specifically, `arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM free), `arc_evict_state_impl()` will defer wakups. In this case, the arc_evict_waiter_t's remain on the list, even though `arc_evict_count` has been incremented past their `aew_count`. The problem is that `arc_wait_for_eviction()` assumes that if there are waiters on the list, the count they are waiting for has not yet been reached. However, the deferred wakeups may violate this, causing `ASSERT(last->aew_count > arc_evict_count)` to fail. This commit resolves the issue by having new waiters use the greater of `arc_evict_count` and the last `aew_count`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11285 Closes #11397	2021-01-23 15:47:06 -08:00
Brian Behlendorf	dd487640b2	Linux 5.10 compat: restore custom uio_prefaultpages() As part of commit `1c2358c1` the custom uio_prefaultpages() code was removed in favor of using the generic kernel provided iov_iter_fault_in_readable() interface. Unfortunately, it turns out that up until the Linux 4.7 kernel the function would only ever fault in the first iovec of the iov_iter. The result being uiomove_iov() may hang waiting for the page. This commit effectively restores the custom uio_prefaultpages() pages code for Linux 4.9 and earlier kernels which contain the troublesome version of iov_iter_fault_in_readable(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11463 Closes #11484	2021-01-22 09:58:49 -08:00
Konstantin Khorenko	59570a05d8	VZ 7 kernel compat: introduce ITER-enabled .direct_IO() via IOVECs Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46 have the following configuration: * no HAVE_VFS_RW_ITERATE * HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET => let's add implementation of zpl_direct_IO() via zpl_aio_{read,write}() in this case. https://bugs.openvz.org/browse/OVZ-7243 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Closes #11410 Closes #11411	2021-01-05 10:33:41 -08:00
Toomas Soome	921ec61b77	implicit conversion from 'boolean_t' to 'ds_hold_flags_t' Build error on illumos with gcc 10 did reveal: In function 'dmu_objset_refresh_ownership': ../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'} [-Werror=enum-conversion] 857 \| dsl_dataset_disown(ds, decrypt, tag); \| ^~~~~~~ cc1: all warnings being treated as errors libzfs_input_check.c: In function 'zfs_ioc_input_tests': libzfs_input_check.c:754:28: error: implicit conversion from 'enum dmu_objset_type' to 'enum lzc_dataset_type' [-Werror=enum-conversion] 754 \| err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0); \| ^~~~~~~~~~~ cc1: all warnings being treated as errors The same issue is present in openzfs, and also the same issue about ds_hold_flags_t, which currently defines exactly one valid value. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #11406	2021-01-05 10:30:19 -08:00
Brian Behlendorf	fcd9966ed9	Linux 5.11 compat: blk_{un}register_region() As of 5.11 the blk_register_region() and blk_unregister_region() functions have been retired. This isn't a problem since add_disk() has implicitly allocated minor numbers for a very long time. Reviewed-by: Rafael Kitover <rkitover@gmail.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11387 Closes #11390	2021-01-05 10:26:39 -08:00
Brian Behlendorf	a2621753b2	Linux 5.11 compat: revalidate_disk_size() Both revalidate_disk_size() and revalidate_disk() have been removed. Functionally this isn't a problem because we only relied on these functions to call zvol_revalidate_disk() for us and to perform any additional handling which might be needed for that kernel version. When neither are available we know there's no additional handling needed and we can directly call zvol_revalidate_disk(). Reviewed-by: Rafael Kitover <rkitover@gmail.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11387 Closes #11390	2021-01-05 10:26:32 -08:00
Brian Behlendorf	e888f28988	Linux 5.11 compat: bdev_whole() The bd_contains member was removed from the block_device structure. Callers needing to determine if a vdev is a whole block device should use the new bdev_whole() wrapper. For older kernels we provide our own bdev_whole() wrapper which relies on bd_contains for compatibility. Reviewed-by: Rafael Kitover <rkitover@gmail.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11387 Closes #11390	2021-01-05 10:26:25 -08:00
Brian Behlendorf	305510fd33	Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct() The generic IO accounting functions have been removed in favor of the bio_start_io_acct() and bio_end_io_acct() functions which provide a better interface. These new functions were introduced in the 5.8 kernels but it wasn't until the 5.11 kernel that the previous generic IO accounting interfaces were removed. This commit updates the blk_generic_*_io_acct() wrappers to provide and interface similar to the updated kernel interface. It's slightly different because for older kernels we need to pass the request queue as well as the bio. Reviewed-by: Rafael Kitover <rkitover@gmail.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11387 Closes #11390	2021-01-05 10:26:18 -08:00
Brian Behlendorf	67cff6e4c1	Linux 5.11 compat: lookup_bdev() The lookup_bdev() function has been updated to require a dev_t be passed as the second argument. This is actually pretty nice since the major number stored in the dev_t was the only part we were interested in. This allows to us avoid handling the bdev entirely. The vdev_lookup_bdev() wrapper was updated to emulate the behavior of the new lookup_bdev() for all supported kernels. Reviewed-by: Rafael Kitover <rkitover@gmail.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11387 Closes #11390	2021-01-05 10:26:09 -08:00
Andy Fiddaman	cee725c9bd	Dangling reference from dmu_objset_upgrade After porting the fix for https://github.com/openzfs/zfs/issues/5295 over to illumos, we started hitting an assertion failure when running the testsuite: assertion failed: rc->rc_count == number, file: .../refcount.c and the unexpected hold has this stack: dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73 dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f The simplest reproducer for this in illumos is zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool which is run as part of the zpool_create_tempname test, but I can't get this to trigger on FreeBSD. This appears to be because of the call to txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in illumos), slows down dmu_objset_disown() enough to avoid the condition. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andy Fiddaman <andy@omnios.org> Closes #11368	2020-12-23 14:35:47 -08:00
Michael D Labriola	7a7e101437	Linux 5.10 compat: also zvol_revalidate_disk() Commit `59b68723` added a configure check for 5.10, which removed revalidate_disk(), and conditionally replaced it's usage with a call to the new revalidate_disk_size() function. However, the old function also invoked the device's registered callback, in our case zvol_revalidate_disk(). This commit adds a call to zvol_revalidate_disk() in zvol_update_volsize() to make sure the code path stays the same. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com> Closes #11358	2020-12-23 14:35:47 -08:00
Brian Behlendorf	401ba57ccd	Fix maybe uninitialized variable warning Commit `1c2358c12` restructured this code and introduced a warning about the variable maybe not being initialized. This cannot happen with the updated code but we should initialize the variable anyway to silence the warning. zpl_file.c: In function ‘zpl_iter_write’: zpl_file.c:324:9: warning: ‘count’ may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11373	2020-12-23 14:35:47 -08:00
Brian Behlendorf	188950df9e	Remove iov_iter_advance() from iter_read There's no need to call iov_iter_advance() in zpl_iter_read(). This was preserved from the previous code where it wasn't needed but also didn't cause any problems. Now that the iter functions also handle pipes that's no longer the case. When fully reading a pipe buffer iov_iter_advance() may results in the pipe buf release function being called which will not be registered resulting in a NULL dereference. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11375 Closes #11378	2020-12-23 14:35:47 -08:00
Brian Behlendorf	58bc86c5cb	Linux 5.10 compat: use iov_iter in uio structure As of the 5.10 kernel the generic splice compatibility code has been removed. All filesystems are now responsible for registering a ->splice_read and ->splice_write callback to support this operation. The good news is the VFS provided generic_file_splice_read() and iter_file_splice_write() callbacks can be used provided the ->iter_read and ->iter_write callback support pipes. However, this is currently not the case and only iovecs and bvecs (not pipes) are ever attached to the uio structure. This commit changes that by allowing full iov_iter structures to be attached to uios. Ever since the 4.9 kernel the iov_iter structure has supported iovecs, kvecs, bvevs, and pipes so it's desirable to pass the entire thing when possible. In conjunction with this the uio helper functions (i.e uiomove(), uiocopy(), etc) have been updated to understand the new UIO_ITER type. Note that using the kernel provided uio_iter interfaces allowed the existing Linux specific uio handling code to be simplified. When there's no longer a need to support kernel's older than 4.9, then it will be possible to remove the iovec and bvec members from the uio structure and always use a uio_iter. Until then we need to maintain all of the existing types for older kernels. Some additional refactoring and cleanup was included in this change: - Added checks to configure to detect available iov_iter interfaces. Some are available all the way back to the 3.10 kernel and are used when available. In particular, uio_prefaultpages() now always uses iov_iter_fault_in_readable() which is available for all supported kernels. - The unused UIO_USERISPACE type has been removed. It is no longer needed now that the uio_seg enum is platform specific. - Moved zfs_uio.c from the zcommon.ko module to the Linux specific platform code for the zfs.ko module. This gets it out of libzfs where it was never needed and keeps this Linux specific code out of the common sources. - Removed unnecessary O_APPEND handling from zfs_iter_write(), this is redundant and O_APPEND is already handled in zfs_write(); NOTE: Cleanly applying this kernel compatibility change required applying the following commits. This makes the change larger than it absolutely needs to be, but the resulting code matches what's in the branch branch. This is both more tested and makes it easier to apply any future backports in this area. `7cf4cd824` Remove incorrect assertion `783be694f` Reduce confusion in zfs_write `af5626ac2` Return EFAULT at the end of zfs_write() when set `cc1f85be8` Simplify offset and length limit in zfs_write `9585538d0` Const some unchanging variables in zfs_write `86e74dc16` Remove redundant oid parameter to update_pages `b3d723fb0` Factor uid, gid, and projid out of loop in zfs_write `3d40b6554` Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD Reviewed-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11351	2020-12-23 14:35:39 -08:00
Brian Behlendorf	7cf4cd8246	Remove incorrect assertion Commit `85703f6` added a new ASSERT to zfs_write() as part of the cleanup which isn't correct in the case where multiple processes are concurrently extending a file. The `zp->z_size` is updated atomically while holding a range lock on only a portion of the file. Therefore, it's possible for the file size to increase after a same check is performed earlier in the loop causing this ASSERT to fail. The code itself handles this case correctly so only the invalid ASSERT needs to be removed. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11235	2020-12-23 14:35:00 -08:00
Ryan Moeller	783be694f1	Reduce confusion in zfs_write Is this block when abuf != NULL ever reached? Yes, it is. Add asserts and comments to prove that when we get here, we have a full block write at an aligned offset extending past EOF. Simplify by removing the check that tx_bytes == max_blksz, since we can assert that it is always true. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11191	2020-12-23 14:35:00 -08:00
Ryan Moeller	af5626ac27	Return EFAULT at the end of zfs_write() when set FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete the full write so it can retry the operation. Add some missing SET_ERRORs in zfs_write(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11193	2020-12-23 14:35:00 -08:00
Ryan Moeller	cc1f85be8b	Simplify offset and length limit in zfs_write Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-12-23 14:34:59 -08:00
Ryan Moeller	9585538d0e	Const some unchanging variables in zfs_write Show that these values will not be changing later. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-12-23 14:34:59 -08:00
Ryan Moeller	86e74dc162	Remove redundant oid parameter to update_pages The oid comes from the znode we are already passing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-12-23 14:34:59 -08:00
Ryan Moeller	b3d723fb0e	Factor uid, gid, and projid out of loop in zfs_write Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11176	2020-12-23 14:34:59 -08:00
Matthew Macy	3d40b65540	Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD The zfs_fsync, zfs_read, and zfs_write function are almost identical between Linux and FreeBSD. With a little refactoring they can be moved to the common code which is what is done by this commit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11078	2020-12-23 14:34:59 -08:00
Matthew Ahrens	a103ae446e	special device removal space accounting fixes The space in special devices is not included in spa_dspace (or dsl_pool_adjustedsize(), or the zfs `available` property). Therefore there is always at least as much free space in the normal class, as there is allocated in the special class(es). And therefore, there is always enough free space to remove a special device. However, the checks for free space when removing special devices did not take this into account. This commit corrects that. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11329	2020-12-23 14:34:59 -08:00
Ryan Libby	ee49d9e02b	lua: avoid gcc -Wreturn-local-addr bug Avoid a bug with gcc's -Wreturn-local-addr warning with some obfuscation. In buggy versions of gcc, if a return value is an expression that involves the address of a local variable, and even if that address is legally converted to a non-pointer type, a warning may be emitted and the value of the address may be replaced with zero. Howerver, buggy versions don't emit the warning or replace the value when simply returning a local variable of non-pointer type. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Libby <rlibby@FreeBSD.org> Closes #11337	2020-12-23 14:34:59 -08:00
George Amanakis	900480bd96	Fix reporting of CKSUM errors in indirect vdevs When removing and subsequently reattaching a vdev, CKSUM errors may occur as vdev_indirect_read_all() reads from all children of a mirror in case of a resilver. Fix this by checking whether a child is missing the data and setting a flag (ic_error) which is then checked in vdev_indirect_repair() and suppresses incrementing the checksum counter. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #11277	2020-12-23 14:34:59 -08:00
Ryan Moeller	8847b06bf6	FreeBSD: Implement sysctl for fletcher4 impl There is a tunable to select the fletcher 4 checksum implementation on Linux but it was not present in FreeBSD. Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL to provide the tunable on both platforms. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11270	2020-12-23 14:34:59 -08:00
Paul Dagnelie	21adfb031c	Fix kernel panic induced by redacted send In the redaction list traversal code, there is a bug in the binary search logic when looking for the resume point. Maxbufid can be decremented to -1, causing us to read the last possible block of the object instead of the one we wanted. This can cause incorrect resume behavior, or possibly even a hang in some cases. In addition, when examining non-last blocks, we can treat the block as being the same size as the last block, causing us to miss entries in the redaction list when determining where to resume. Finally, we were ignoring the case where the resume point was found in the buffer being searched, and resuming from minbufid. All these issues have been corrected, and the code has been significantly simplified to make future issues less likely. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11297	2020-12-23 14:34:59 -08:00
Ryan Moeller	ae2cfdf8a7	FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes py-sysctl to format it as a bytearray when it should be an integer. "U" is not a valid format, it should be "I" and the type should match the variable type, int. We can return EINVAL if the value is set below zero. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11318	2020-12-23 14:34:59 -08:00
Brian Behlendorf	f217a2b902	Fix possibly uninitialized 'root_inode' variable warning Resolve an uninitialized variable warning when compiling. In function ‘zfs_domount’: warning: ‘root_inode’ may be used uninitialized in this function [-Wmaybe-uninitialized] sb->s_root = d_make_root(root_inode); Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11306	2020-12-23 14:34:59 -08:00
Ryan Moeller	fb3ad5d24e	FreeBSD: Do zcommon_init sooner to avoid FPU panic There has been a panic affecting some system configurations where the thread FPU context is disturbed during the fletcher 4 benchmarks, leading to a panic at boot. module_init() registers zcommon_init to run in the last subsystem (SI_SUB_LAST). Running it as soon as interrupts have been configured (SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks before we start doing other things. While it's not clear how the FPU context was being disturbed, this does seem to avoid it. Add a module_init_early() macro to run zcommon_init() at this earlier point on FreeBSD. On Linux this is defined as module_init(). Authored by: Konstantin Belousov <kib@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11302	2020-12-23 14:34:59 -08:00

1 2 3 4 5 ...

3259 Commits