Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Paul Dagnelie	7bd292e59b	Fix cpu hotplug atomic sleep issue We move the spinlock unlock before the thread creation. This should be safe because the thread creation code doesn't actually manipulate any taskq data structures; that's done by the thread once it's created. We also remove the assertion that the maxthreads is the current threads plus one; that assertion could fail if multiple hotplug events come in quick succession, and the first new taskq thread hasn't had a chance to start processing yet. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> eviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #12714	2022-02-15 21:52:45 -08:00
наб	94a4b7ec3d	module: zfs: fix unused, remove argsused Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12844	2022-02-16 17:58:56 -08:00
drowfx	bc99c809d5	Add dataset_kstats_update.. to mmap read/write paths This allows reads/writes caused by accesses to mmap files to be accounted correctly in the per-dataset kstats for both Linux and FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Signed-off-by: Matthias Blankertz <matthias@blankertz.org> Closes #12994 Closes #13044	2022-02-16 17:58:56 -08:00
George Amanakis	e257bd481b	Introduce a flag to skip comparing the local mac when raw sending Raw receiving a snapshot back to the originating dataset is currently impossible because of user accounting being present in the originating dataset. One solution would be resetting user accounting when raw receiving on the receiving dataset. However, to recalculate it we would have to dirty all dnodes, which may not be preferable on big datasets. Instead, we rely on the os_phys flag OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is incomplete when raw receiving. Thus, on the next mount of the receiving dataset the local mac protecting user accounting is zeroed out. The flag is then cleared when user accounting of the raw received snapshot is calculated. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #12981 Closes #10523 Closes #11221 Closes #11294 Closes #12594 Issue #11300	2022-02-04 16:14:56 -08:00
Finix1979	1009e60992	Linux <4.8 compat: submit_bio() rw arg When using the two argument version of submit_bio() in kernel's prior to 4.8 the first argument should be specified. It's used by block dump to report the bio direction. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Finix Yan <yancw@info2soft.com> Closes #13006	2022-02-04 08:33:52 -08:00
наб	4f6599416a	Linux 5.17 compat: PDE_DATA() renamed to pde_data() Upstream commit 359745d78351c6f5442435f81549f0207ece28aa ("proc: remove PDE_DATA() completely") Link: https://lore.kernel.org/all/20211124081956.87711-2-songmuchun@bytedance.com/T/#u Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #13004 Closes #12989	2022-02-04 08:33:52 -08:00
наб	f42c126029	Linux 5.17 compat: dequeue_signal() takes a 4th argument Linux 5.17's dequeue_signal() takes an additional enum pid_type * output argument Upstream commit 5768d8906bc23d512b1a736c1e198aa833a6daa4 ("signal: Requeue signals in the appropriate queue") Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12989	2022-02-04 08:33:52 -08:00
наб	2ce06d93a8	Linux 5.17 compat: detect complete_and_exit() rename Linux 5.17 sees a rename from complete_and_exit() to kthread complete_and_exit() Upstream commit cead18552660702a4a46f58e65188fe5f36e9dfe ("exit: Rename complete_and_exit to kthread_complete_and_exit") Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12989	2022-02-04 08:33:52 -08:00
Rich Ercolani	8ef01afbfc	Add support for FALLOC_FL_ZERO_RANGE For us, I think it's always just FALLOC_FL_PUNCH_HOLE with a fake mustache on. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12975	2022-02-04 08:33:52 -08:00
Rich Ercolani	c31c1146b6	Linux 5.16 compat: Added add_disk check for return add_disk went from void to must-check int return. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12975	2022-02-04 08:33:52 -08:00
Ryan Moeller	af1630c883	FreeBSD: Fix zvol_cdev_open locking First open locking changes were correctly applied to zvol_geom_open but incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be held indefinitely. Make the first open locking in zvol_cdev_open match zvol_geom_open. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #13016	2022-02-03 15:28:01 -08:00
Ryan Moeller	1828b68a0b	FreeBSD: Fix zvol_*_open() locking These are the changes for FreeBSD corresponding to the changes made for Linux in #12863, see that PR for details. Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open on FreeBSD. This also adds a check for the zvol dying which we had in zvol_geom_open but was missing in zvol_cdev_open. The check causes the open to fail early with ENXIO when we are in the middle of changing volmode. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #12934	2022-02-03 15:28:01 -08:00
наб	36a91d6cef	FreeBSD: vfsops: use setgen for error case Fix from https://github.com/openzfs/zfs/pull/12844#discussion_r774179413 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12905	2022-02-03 15:28:01 -08:00
chrisrd	1259dc6e6a	zfs_prune: reset sc.nr_to_scan sc.nr_to_scan is an input to super_cache_clean (via shrinker->scan_objects), used to set the number of objects to scan in the various caches. However super_cache_scan also modifies sc.nr_to_scan, so when used in a loop we need to reset sc.nr_to_scan back to our desired nr_to_scan for the next iteration. Issue discovered and solution suggested by Tenzin Lhakhang @tlhakhan. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Issue #12433 Closes #12908	2022-02-03 15:28:01 -08:00
наб	5d8c081193	FreeBSD: fix unpropagated error When performing I/O on FreeBSD using a file based vdev ensure all errors encountered when reading/writing are propagated through the zio pipeline. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12904	2022-02-03 15:28:01 -08:00
Brian Behlendorf	9ec630ff2c	Fix zvol_open() lock inversion When restructuring the zvol_open() logic for the Linux 5.13 kernel a lock inversion was accidentally introduced. In the updated code the spa_namespace_lock is now taken before the zv_suspend_lock allowing the following scenario to occur: down_read <=== waiting for zv_suspend_lock zvol_open <=== holds spa_namespace_lock __blkdev_get blkdev_get_by_dev blkdev_open ... mutex_lock <== waiting for spa_namespace_lock spa_open_common spa_open dsl_pool_hold dmu_objset_hold_flags dmu_objset_hold dsl_prop_get dsl_prop_get_integer zvol_create_minor dmu_recv_end zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend() zfs_ioc_recv ... This commit resolves the issue by moving the acquisition of the spa_namespace_lock back to after the zv_suspend_lock which restores the original ordering. Additionally, as part of this change the error exit paths were simplified where possible. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12863	2022-02-03 15:28:01 -08:00
Alan Somers	4b2bac5fe9	FreeBSD: Update argument types for VOP_READDIR A recent commit to FreeBSD changed the type of vop_readdir_args.a_cookies to a uint64_t**. There is no functional impact to ZFS because ZFS only uses 32-bit cookies, which will be zero-extended to 64-bits by the existing code. `b214fcceac` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Alan Somers <asomers@gmail.com> Closes #12874	2022-02-03 15:28:01 -08:00
Ryan Moeller	913ae45218	FreeBSD: Provide correct file generation number va_seq was actually a thin veil over va_gen, so z_gen is a more appropriate value than z_seq to populate the field with. Drop the unnecessary compat obfuscation and provide the correct file generation number. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@freebsd.org> Closes #12851	2022-02-03 15:28:01 -08:00
Ryan Moeller	def73c0735	FreeBSD: Add vop_standard_writecount_nomsync https://cgit.freebsd.org/src/commit?id=3ffcfa599e29686cf2b3c1a6087408c37acaed78 Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #12828	2021-12-13 13:23:07 -08:00
Ryan Moeller	effe984148	FreeBSD: Catch up with more VFS changes Unused thread argument was removed from NDINIT* https://cgit.freebsd.org/src/commit?id=7e1d3eefd410ca0fbae5a217422821244c3eeee4 Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #12828	2021-12-13 13:23:01 -08:00
Mark Johnston	19337332cc	Fix several bugs in the FreeBSD rename VOP implementation - To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the teardown lock is acquired with ZFS_ENTER(). - Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_() when the ZFS_ENTER() macros forces an early return. Refactor the rename implementation so that ZFS_ENTER() can be used safely. As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead of open-coding its implementation. Reported-by: Peter Holm <pho@FreeBSD.org> Tested-by: Peter Holm <pho@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Sponsored-by: The FreeBSD Foundation Closes #12717	2021-12-13 13:22:54 -08:00
Pawel Jakub Dawidek	b96737b83e	Remove (now unused) td argument from zfs_lookup() Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net> Closes #12748	2021-12-13 13:22:47 -08:00
Mark Johnston	4b7bfcf8a0	Exit the teardown section later in rename on FreeBSD We have to hold the teardown lock while dereferencing zfsvfs->z_os and, I believe, when committing to the ZIL. Note that jumping to the "out" label, "error" is always non-zero. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #12704	2021-12-13 13:22:41 -08:00
Mark Johnston	07165ce540	Fix potential use-after-frees in FreeBSD getpages and setattr VOPs The objset object is reallocated during certain dataset operations, such as rollbacks, so the objset pointer must be loaded after acquiring the teardown lock. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #12704	2021-12-13 13:22:34 -08:00
Damian Szuberski	64e88992b6	Update `checkstyle` workflow env to ubuntu-20.04 - `checkstyle` workflow uses ubuntu-20.04 environment - improved `mancheck.sh` readability Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes #12713	2021-12-08 13:27:56 -08:00
Coleman Kane	bef7c02c81	Linux 5.16: The blk-cgroup.h header is where struct blkcg_gq is defined The definition of struct blkcg_gq was moved into blk-cgroup.h, which is a header that's been in Linux since 2015. This is used by vdev_blkg_tryget() in module/os/linux/zfs/vdev_disk.c. Since the kernel for CentOS 7 and similar-generation releases doesn't have this header, its inclusion is guarded by a configure test. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #12819	2021-12-07 13:14:23 -08:00
Coleman Kane	ea61e07413	Linux 5.16: bio_set_dev is no longer a helper macro This change adds a confiugre check to determine if bio_set_dev is a helper macro or not. If not, then the attempt to override its internal call to bio_associate_blkg(), with a macro definition to our own version, is no longer possible, as the compiler won't use it when compiling the new inline function replacement implemented in the header. This change also creates a new vdev_bio_set_dev() function that performs the same work, and also performs the work implemented in vdev_bio_associate_blkg(), as it is the only thing calling that function in our code. Our custom vdev_bio_associate_blkg() is now only compiled if the bio_set_dev() is a macro in the Linux headers. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #12819	2021-12-07 13:14:23 -08:00
Coleman Kane	9519fe1ff8	Linux 5.16: type member of iov_iter renamed iter_type The iov_iter->type member was renamed iov_iter->iter_type. However, while looking into this, realized that in 2018 a iov_iter_type(*iov) accessor function was introduced. So if that is present, use it, otherwise fall back to trying the existing behavior of directly accessing type from iov_iter. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #12819	2021-12-07 13:14:23 -08:00
Coleman Kane	0c40ff56f2	Linux 5.16: block_device_operations->submit_bio now returns void The return type for the submit_bio member of struct block_device_operations was changed to no longer return a value. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #12819	2021-12-07 13:14:23 -08:00
Brian Behlendorf	16da688f25	Linux 5.13 compat: retry zvol_open() when contended Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #12301 Closes #12759	2021-12-06 12:22:57 -08:00
Coleman Kane	12d27e7134	Linux 5.16: wait_on_page_bit() no longer available to modules Instead, linux/pagemap.h offers a number of folio-specific functions to be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies the code to conditionally compile that if configure identifies th presence of the folio_wait_bit() function. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #12800	2021-12-06 12:22:38 -08:00
Brian Behlendorf	22b0891dbb	Linux 5.16 compat: submit_bio() The submit_bio() prototype has changed again. The version is 5.16 still only expects a single argument but the return type has changed to void. Since we never used the returned value before update the configure check to detect both single arg versions. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12725	2021-11-05 07:51:21 -07:00
Ryan Moeller	27d9c6ae2b	FreeBSD: Catch up with recent VFS changes cn_thread is always curthread. https://cgit.freebsd.org/src/commit?id=b4a58fbf640409a1e507d9f7b411c83a3f83a2f3 https://cgit.freebsd.org/src/commit?id=2b68eb8e1dbbdaf6a0df1c83b26f5403ca52d4c3 Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #12668	2021-11-02 13:48:54 -07:00
Brian Behlendorf	143476ce8d	Use fallthrough macro As of the Linux 5.9 kernel a fallthrough macro has been added which should be used to anotate all intentional fallthrough paths. Once all of the kernel code paths have been updated to use fallthrough the -Wimplicit-fallthrough option will because the default. To avoid warnings in the OpenZFS code base when this happens apply the fallthrough macro. Additional reading: https://lwn.net/Articles/794944/ Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12441	2021-11-02 09:50:30 -07:00
Brian Behlendorf	32512acbc0	Linux 5.15 compat: get_acl() Kernel commits 332f606b32b6 ovl: enable RCU'd ->get_acl() 0cad6246621b vfs: add rcu argument to ->get_acl() callback Added compatibility code to detect the new ->get_acl() interface and correctly handle the case where the new rcu argument is set. Reviewed-by: Coleman Kane <ckane@colemankane.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12548	2021-09-14 15:42:59 -07:00
Ryan Moeller	81611683c8	FreeBSD: Don't remove SA xattr if not SA znode We attempt to remove an existing SA xattr when setting a dir xattr, but this only makes sense if the znode has been upgraded to the SA format. Otherwise, we will hit an assert in zfs_sa_get_xattr. Make sure this is an SA znode before attempting to remove the SA xattr. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #12514	2021-09-14 15:11:56 -07:00
Richard Yao	1655ce5619	Linux 4.11 compat: statx support Linux 4.11 added a new statx system call that allows us to expose crtime as btime. We do this by caching crtime in the znode to match how atime, ctime and mtime are cached in the inode. statx also introduced a new way of reporting whether the immutable, append and nodump bits have been set. It adds support for reporting compression and encryption, but the semantics on other filesystems is not just to report compression/encryption, but to allow it to be turned on/off at the file level. We do not support that. We could implement semantics where we refuse to allow user modification of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc() to find out encryption/compression information. That would introduce locking that will have a minor (although unmeasured) performance cost. It also would be inferior to zdb, which reports far more detailed information. We therefore omit reporting of encryption/compression through statx in favor of recommending that users interested in such information use zdb. Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #8507	2021-09-14 14:31:50 -07:00
Alexander Motin	a4862125b8	Remove b_pabd/b_rabd allocation from arc_hdr_alloc() When a header is allocated for full overwrite it is a waste of time to allocate b_pabd/b_rabd for it, since arc_write() will free them without ever being touched. If it is a read or a partial overwrite then arc_read() and arc_hdr_decrypt() allocate them explicitly. Reduced memory allocation in user threads also reduces ARC eviction throttling there, proportionally increasing it in ZIO threads, that is not good. To minimize or even avoid it introduce ARC allocation reserve, allowing certain arc_get_data_abd() callers to allocate a bit longer in situations where user threads will already throttle. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12398	2021-09-14 14:31:50 -07:00
Allan Jude	24e51e3749	Restore FreeBSD sysctl processing for arc.min and arc.max Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max to a disallowed value would return an error. Since the switch, it instead only generates WARN_IF_TUNING_IGNORED Keep the ability to set the sysctl's specifically to 0, even though that is less than the minimum, because some tests depend on this. Also lost, was the ability to set vfs.zfs.arc_max to a value less than the default vfs.zfs.arc_min at boot time. Restore this as well. Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #12161	2021-09-14 14:31:01 -07:00
Ryan Moeller	744f3009fc	zfs: add missed dependency of zfs module on zlib Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Martin Matuska <mm@FreeBSD.org> Co-authored-by: Konstantin Belousov <kib@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> External-issue: https://reviews.freebsd.org/D31207 Closes #12442	2021-09-14 14:30:39 -07:00
Brian Behlendorf	32a971e749	Enable /proc/diskstats for zvols The /proc/diskstats accounting needs to be explicitly enabled for block devices which do not use multi-queue. Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12440 Closes #12066	2021-09-14 14:30:13 -07:00
hedongzhang	ddb732e2c8	Modify checksum obtain method of QAT CpaDcGeneratefooter function that obtain the checksum code does not support the CPA_DC_STATELESS mode. So we get the adler32 chencksum of the end of the zlib from dc_results. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com> Signed-off-by: hedong.zhang <h_d_zhang@163.com> Closes #12343	2021-09-14 14:29:46 -07:00
Mark Johnston	451d6da988	Allow disabling of unmapped I/O on FreeBSD We have a tunable which permits one to disable the use of unmapped I/O for the buffer cache. Respect it in ZFS as well. This is useful for KMSAN, which cannot easily maintain shadow state for unmapped pages. No functional change intended, as unmapped I/O is permitted by default and there's no real reason to disable it in practice except for debugging. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #12446	2021-09-14 14:29:46 -07:00
Coleman Kane	4434baab11	Linux 5.14 compat: explicity assign set_page_dirty Kernel 5.14 introduced a change where set_page_dirty of struct address_space_operations is no longer implicitly set to __set_page_dirty_buffers(), which ended up resulting in a NULL pointer deref in the kernel when it is attempted to be called. This change sets .set_page_dirty in the structure to __set_page_dirty_nobuffers(), which was introduced with the related patch set. The breaking change was introduce in commit 0af573780b0b13fceb7fabd49dc1b073cee9a507 to torvalds/linux.git. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #12427	2021-09-14 12:41:10 -07:00
Brian Behlendorf	2f073cc9c6	Linux 5.14 compat: blk_alloc_disk() In Linux 5.14, blk_alloc_queue is no longer exported, and its usage has been superseded by blk_alloc_disk, which returns a gendisk struct from which we can still retrieve the struct request_queue* that is needed in the one place where it is used. This also replaces the call to alloc_disk(minors), and minors is now set via struct member assignment. Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12362 Closes #12409	2021-09-14 12:40:45 -07:00
Alexander Motin	93e11e257b	FreeBSD: Ignore make_dev_s() errors Since errors returned by zvol_create_minor_impl() are ignored by the common code, it is more convenient to ignore make_dev_s() errors there. It allows, for example, to get device created for the zvol after later rename instead of having it further stuck in half-created state. zvol_rename_minor() already ignores those errors. While there, switch from MAXPHYS to maxphys in FreeBSD 13+. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12375	2021-09-14 12:40:45 -07:00
Alexander	4affa09f3e	A few fixes of callback typecasting (for the upcoming ClangCFI) * zio: avoid callback typecasting * zil: avoid zil_itxg_clean() callback typecasting * zpl: decouple zpl_readpage() into two separate callbacks * nvpair: explicitly declare callbacks for xdr_array() * linux/zfs_nvops: don't use external iput() as a callback * zcp_synctask: don't use fnvlist_free() as a callback * zvol: don't use ops->zv_free() as a callback for taskq_dispatch() Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Closes #12260	2021-09-14 12:39:48 -07:00
Alexander Motin	c2c4d05700	FreeBSD: Switch from MAXPHYS to maxphys on FreeBSD 13+ Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12378	2021-09-14 12:39:17 -07:00
Alexander Motin	45305a067f	Fix ARC ghost states eviction accounting arc_evict_hdr() returns number of evicted bytes in scope of specific state. For ghost states it does not mean the amount of really freed memory, but the logical buffer size. It is correct for the eviction process, but not for waking up threads waiting for ARC size reduction, as added in "Revise ARC shrinker algorithm" commit, causing premature wakeups while ARC is still overflowed, allowing even bigger overflow, plus processing overhead when next allocation will also get blocked, probably also for too short time. To fix that make arc_evict_hdr() also return the amount of really freed memory, which for the ghost states is only the header, and use it to update arc_evict_count instead. Originally I was thinking to not return it at all, since arc_get_data_impl() does not account for the headers, but decided that some slow allocation progress is better than long waits, reaching on my tests up to 100ms. To reduce negative latency effects of long time periods when reclaim thread can free little real memory, start reclamation process earlier, before we actually reached the overflow threshold, when we have to throttle new allocations. We can also do it without taking global arc_evict_lock, reducing the contention. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12279	2021-09-14 12:38:05 -07:00
George Wilson	8415c3c170	file reference counts can get corrupted Callers of zfs_file_get and zfs_file_put can corrupt the reference counts for the file structure resulting in a panic or a soft lockup. When zfs send/recv runs, it will add a reference count to the open file, and begin to send or recv the stream. If the file descriptor is closed, then when dmu_recv_stream() or dmu_send() return we will call zfs_file_put to remove the reference we placed on the file structure. Unfortunately, because zfs_file_put() uses the file descriptor to lookup the file structure, it may end up finding that the file descriptor table no longer contains the file struct, thus leaking the file structure. Or it might end up finding a file descriptor for a different file and blindly updating its reference counts. Other failure modes probably exists. This change reworks the zfs_file_[get\|put] interface to not rely on the file descriptor but instead pass the zfs_file_t pointer around. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-issue: DLPX-76119 Closes #12299	2021-09-14 12:37:38 -07:00
Alexander Motin	c84670950a	FreeBSD: Use unmapped I/O for scattered/gang ABD buffers Many FreeBSD disk drivers support "unmapped" I/O mode, when data buffer represented not with a virtually contiguous KVA-mapped address range, but with a list of physical memory pages. Originally it was designed to do I/O from buffers without KVA mapping (unmapped). But moving virtual addresses out of equation allows us to operate even non-contiguous data buffers with one condition: all buffer discon- tinuities must be aligned to memory page borders. Doing I/O to capable GEOM device this patch traverses through non- linear ABD buffers, validating the chunks borders. If the condition is met, it supplies GEOM with the list of original physical memory pages instead of copying the data into temporary contiguous buffer. On capable hardware on pools with ashift=12 and default ABD chunk of 4KB it should handle all the I/O without additional memory copying. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12320	2021-09-14 12:37:02 -07:00
Alexander Motin	49bb454120	FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE It makes no sense to set it below PAGE_SIZE, since it increases all overheads and makes returning memory to OS problematic. It makes no sense to set it above PAGE_SIZE, since such allocations and especially frees are too expensive and cause KVA fragmentation to benefit from fewer chunks. After that it makes no sense to keep more complicated math here. What may have sense though is just a tunable border between linear and scatter ABDs, previously also controlled by this tunable. Retain that functionality by taking abd_scatter_min_size tunable from Linux, just with different default value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12328	2021-09-14 12:36:44 -07:00
Jorgen Lundman	035219ee10	Fix abd leak, kmem_free correct size of abd_t Fix a leak of abd_t that manifested mostly when using raidzN with at least as many columns as N (e.g. a four-disk raidz2 but not a three-disk raidz2). Sufficiently heavy raidz use would eventually run a system out of memory. Additionally: * Switch abd_cache arena to FIRSTFIT, which empirically improves perofrmance. * Make abd_chunk_cache more performant and debuggable. * Allocate the abd_zero_buf from abd_chunk_cache rather than the heap. * Don't try to reap non-existent qcaches in abd_cache arena. * KM_PUSHPAGE->KM_SLEEP when allocating chunks from their own arena Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Co-authored-by: Sean Doran <smd@use.net> Closes #12295	2021-09-14 12:22:28 -07:00
Alexander Motin	f3969ea78b	Optimize small random numbers generation In all places except two spa_get_random() is used for small values, and the consumers do not require well seeded high quality values. Switch those two exceptions directly to random_get_pseudo_bytes() and optimize spa_get_random(), renaming it to random_in_range(), since it is not related to SPA or ZFS in general. On FreeBSD directly map random_in_range() to new prng32_bounded() KPI added in FreeBSD 13. On Linux and in user-space just reduce the type used to uint32_t to avoid more expensive 64bit division. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12183	2021-09-14 12:10:17 -07:00
Ryan Moeller	6fe6192796	FreeBSD: Implement xattr=sa FreeBSD historically has not cared about the xattr property; it was always treated as xattr=on. With xattr=on, xattrs are stored as files in a hidden xattr directory. With xattr=sa, xattrs are stored as system attributes and get cached in nvlists during xattr operations. This makes SA xattrs simpler and more efficient to manipulate. FreeBSD needs to implement the SA xattr operations for feature parity with Linux and to ensure that SA xattrs are accessible when migrated or replicated from Linux. Following the example set by Linux, refactor our existing extattr vnops to split off the parts handling dir style xattrs, and add the corresponding SA handling parts. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11997	2021-09-14 12:09:35 -07:00
Ryan Moeller	1826068523	FreeBSD: Clean up ASSERT/VERIFY use in module Convert use of ASSERT() to ASSERT0(), ASSERT3U(), ASSERT3S(), ASSERT3P(), and likewise for VERIFY(). In some cases it ended up making more sense to change the code, such as VERIFY on nvlist operations that I have converted to use fnvlist instead. In one place I changed an internal struct member from int to boolean_t to match its use. Some asserts that combined multiple checks with && in a single assert have been split to separate asserts, to make it apparent which check fails. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11971	2021-09-14 12:02:23 -07:00
Attila Fülöp	088712793e	gcc 11 cleanup Compiling with gcc 11.1.0 produces three new warnings. Change the code slightly to avoid them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #12130 Closes #12188 Closes #12237	2021-06-24 13:13:40 -07:00
Rich Ercolani	5e89181544	Annotated dprintf as printf-like ZFS loves using %llu for uint64_t, but that requires a cast to not be noisy - which is even done in many, though not all, places. Also a couple places used %u for uint64_t, which were promoted to %llu. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12233	2021-06-24 13:12:36 -07:00
Alexander Motin	6b239d1757	Use wmsum for arc, abd, dbuf and zfetch statistics. (#12172 ) wmsum was designed exactly for cases like these with many updates and rare reads. It allows to completely avoid atomic operations on congested global variables. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12172	2021-06-24 13:10:59 -07:00
Paul Zuchowski	bd83c1e0c6	Do not hash unlinked inodes In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9741 Closes #11223 Closes #11648 Closes #12210	2021-06-15 16:56:19 -07:00
Alexander Motin	efdfb14fc8	Remove pool io kstats This mostly reverts "3537 want pool io kstats" commit of 8 years ago. From one side this code using pool-wide locks became pretty bad for performance, creating significant lock contention in I/O pipeline. From another, there are more efficient ways now to obtain detailed statistics, while this statistics is illumos-specific and much less usable on Linux and FreeBSD, reported only via procfs/sysctls. This commit does not remove KSTAT_TYPE_IO implementation, that may be removed later together with already unused KSTAT_TYPE_INTR and KSTAT_TYPE_TIMER. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12212	2021-06-10 10:50:16 -07:00
Alan Somers	2c3d7283b4	libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat `getfsstat(2)` is used to retrieve the list of mounted file systems, which libzfs uses when fetching properties like mountpoint, atime, setuid, etc. The `mode` parameter may be `MNT_NOWAIT`, which uses information in the VFS's cache, or `MNT_WAIT`, which effectively does a `statfs` on every single mounted file system in order to fetch the most up-to-date information. As far as I can tell, the only fields that libzfs cares about are the filesystem's name, mountpoint, fstypename, and mount flags. Those things are always updated on mount and unmount, so they will always be accurate in the VFS's mount cache except in two circumstances: 1) When a file system is busy unmounting 2) When a ZFS file system changes the value of a mount-overridable property like atime or setuid, but doesn't remount the file system. Right now that only happens when the property is changed by an unprivileged user who has delegated authority to change the property but not to mount the dataset. But perhaps libzfs could choose to do it for other reasons in the future. Switching to `MNT_NOWAIT` will greatly improve speed with no downside, as long as we explicitly update the mount cache whenever we change a mount-overridable property. For comparison, Illumos gets this information using the native `getmntany` and `getmntent` functions, which also use cached information. The illumos function that would refresh the cache, `resetmnttab`, is never called by libzfs. And on GNU/Linux, `getmntany` and `getmntent` don't even communicate with the kernel directly. They simply parse the file they are given, which is usually /etc/mtab or /proc/mounts. Perhaps the implementation of /proc/mounts is synchronous, ala MNT_WAIT; I don't know. Sponsored-by: Axcient Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alan Somers <asomers@gmail.com> Closes: #12091	2021-06-09 13:05:34 -07:00
jharmening	d3dddbaa20	FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate whether it has changed the busy state of the mount. The filesystem may still assume that its .vfs_quotactl entrypoint is always called with the mount busied, but only needs to unbusy the mount (and clear *mp_busy) if it does something that actually requires the mount to be unbusied. It no longer needs to blindly copy-paste the UFS protocol for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Jason Harmening <jason.harmening@gmail.com> Closes #12052	2021-06-09 13:05:34 -07:00
Brian Behlendorf	af80160f48	Linux: Set spl_kmem_cache_slab_limit when page size !4K For small objects the kernel's slab implementation is very fast and space efficient. However, as the allocation size increases to require multiple pages performance suffers. The SPL kmem cache allocator was designed to better handle these large allocation sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers prefer to use the kernel's slab allocator for small objects and the custom SPL kmem cache allocator for larger objects. This logic was effectively disabled for all architectures using a non-4K page size which caused all kmem caches to only use the SPL implementation. Functionally this is fine, but the SPL code which calculates the target number of objects per-slab does not take in to account that __vmalloc() always returns page-aligned memory. This can result in a massive amount of wasted space when allocating tiny objects on a platform using large pages (64k). To resolve this issue we set the spl_kmem_cache_slab_limit cutoff to 16K for all architectures. This particular change does not attempt to update the logic used to calculate the optimal number of pages per slab. This remains an issue which should be addressed in a future change. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12152 Closes #11429 Closes #11574 Closes #12150	2021-06-09 13:05:34 -07:00
наб	2fe8060cee	spl-module-parameters.5: remove spl_kmem_cache_{expire,obj_per_slab_min} Both were removed in `4fbdb10c7b` ("remove kmem_cache module parameter KMC_EXPIRE_AGE") Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12157	2021-06-09 13:05:34 -07:00
Rich Ercolani	a0c055cfd3	Remove iov_iter_advance() for iter_write The additional iter advance is incorrect, as copy_from_iter() has already done the right thing. This will result in the following warning being printed to the console as of the 5.12 kernel. Attempted to advance past end of bvec iter This change should have been included with #11378 when a similar change was made on the read side. Suggested-by: @siebenmann Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Issue #11378 Closes #12041 Closes #12155	2021-06-09 13:05:34 -07:00
Rich Ercolani	57e3b9c3cc	Bend zpl_set_acl to permit the new userns* parameter Just like #12087, the set_acl signature changed with all the bolted-on *userns parameters, which disabled set_acl usage, and caused #12076. Turn zpl_set_acl into zpl_set_acl and zpl_set_acl_impl, and add a new configure test for the new version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12076 Closes #12093	2021-05-27 22:31:57 -07:00
наб	6316086b72	Various Linux kABI cosmetics Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12103	2021-05-27 22:31:57 -07:00
наб	7cdd4dd33b	linux: don't fall through to 3-arg vfs_getattr Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12103	2021-05-27 22:31:57 -07:00
Alexander Motin	a7bb2dab4e	FreeBSD: Update dataset_kstats for zvols in dev mode Previous commit added accounting for geom mode, but not for dev. In geom mode we actually have GEOM statistics, while in dev mode additional accounting actually makes more sense. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12097	2021-05-27 22:31:57 -07:00
Alexander Motin	aa4a84e616	FreeBSD: avoid memory allocation in arc_prune_async Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12049	2021-05-27 22:31:57 -07:00
Alexander Motin	cc1c7b0171	FreeBSD: Retry OCF ENOMEM errors. ZFS does not expect transient errors from crypto. For read they are counted as checksum errors, while for write end up in panic. To not panic on random low memory conditions retry ENOMEM errors in the OCF wrapper function. While there remove unneeded timeout and priority from msleep(). External-issue: https://reviews.freebsd.org/D30339 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12077	2021-05-27 22:31:57 -07:00
Rich Ercolani	fa7ee48e10	Add note for printing all dbgmsg entries on FreeBSD I looked for a bit, and couldn't find any documentation on how to print all logged dbgmsg entries, just messages since the DTrace probe started, until @allanjude kindly pointed me toward the sysctl. So let's add that note where the DTrace probe is mentioned for FreeBSD, so other people can find it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12113	2021-05-27 22:31:57 -07:00
Rich Ercolani	ab6717cba6	Update tmpfile() existence detection Linux changed the tmpfile() signature again in torvalds/linux@6521f89, which in turn broke our HAVE_TMPFILE detection in configure. Update that macro to include the new case, and change the signature of zpl_tmpfile as appropriate. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes: #12060 Closes: #12087	2021-05-27 22:31:56 -07:00
Rich Ercolani	272b178d52	Simple change to fix building in recent environments Renamed _fini too for symmetry. Suggested-by: @ensch Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12059 Closes: #11987 Closes: #12056	2021-05-27 22:31:56 -07:00
Ryan Moeller	c2c02e490f	FreeBSD: Use SET_ERROR to trace xattr name errors Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11997	2021-05-27 22:12:26 -07:00
Brian Behlendorf	faa5673982	Revert "Fix raw sends on encrypted datasets when copying back snapshots" Commit `d1d4769` takes into account the encryption key version to decide if the local_mac could be zeroed out. However, this could lead to failure mounting encrypted datasets created with intermediate versions of ZFS encryption available in master between major releases. In order to prevent this situation revert `d1d4769` pending a more comprehensive fix which addresses the mount failure case. Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #11294 Issue #12025 Issue #12300 Closes #12033	2021-05-27 22:10:13 -07:00
Coleman Kane	17351a79e2	linux 5.13 compat: bdevops->revalidate_disk() removed Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the revalidate_disk() handler from struct block_device_operations. This caused a regression, and this commit eliminates the call to it and the assignment in the block_device_operations static handler assignment code, when configure identifies that the kernel doesn't support that API handler. Reviewed-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11967 Closes #11977	2021-05-27 22:09:26 -07:00
Ryan Moeller	cb18cf6b0a	FreeBSD: Remove !FreeBSD ifdef'd code Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11994	2021-05-10 12:16:39 -07:00
Ryan Moeller	6c25218c7e	Clean up use of zfs_log_create in zfs_dir zfs_log_create returns void, so there is no reason to cast its return value to void at the call site. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11994	2021-05-10 12:16:32 -07:00
Alyssa Ross	f15ec889a9	Return required size when encode_fh size too small Quoting <linux/exportfs.h>: > encode_fh() should return the fileid_type on success and on error > returns 255 (if the space needed to encode fh is greater than > @max_len*4 bytes). On error @max_len contains the minimum size (in 4 > byte unit) needed to encode the file handle. ZFS was not setting max_len in the case where the handle was too small. As a result of this, the `t_name_to_handle_at.c' example in name_to_handle_at(2) did not work on ZFS. zfsctl_fid() will itself set max_len if called with a fid that is too small, so if we give zfs_fid() that behavior as well, the fix is quite easy: if the handle is too small, just use a zero-size fid instead of the handle. Tested by running t_name_to_handle_at on a normal file, a directory, a .zfs directory, and a snapshot. Thanks-to: Puck Meerburg <puck@puckipedia.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Alyssa Ross <hi@alyssa.is> Closes #11995	2021-05-10 12:13:45 -07:00
Ryan Moeller	85071b2ff5	FreeBSD: Initialize/destroy zp->z_lock zp->z_lock is used in shared code for protecting projid and scantime. We don't exercise these paths much if at all on FreeBSD, so have been lucky enough not to have issues with the uninitialized locks so far. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #12003	2021-05-10 12:13:12 -07:00
Ryan Moeller	5701e393b7	FreeBSD: Prune some unneeded definitions IS_XATTRDIR is never used. v_count is only used in two places, one immediately followed by the use of the real name, v_usecount. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #11973	2021-05-10 12:09:34 -07:00
Martin Matuška	463d7e1a61	Drop "All rights reserved" from files by trasz@FreeBSD.org This obeys the change in freebsd/freebsd-src@bce7ee9d4 External-issue: https://reviews.freebsd.org/D26980 Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #11947	2021-05-10 12:06:21 -07:00
Mateusz Guzik	1b7d883eaa	FreeBSD: damage control racing .. lookups in face of mkdir/rmdir External-issue: https://reviews.freebsd.org/D29769 Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11926	2021-05-10 12:05:49 -07:00
наб	bdcf0cca10	linux/spl: proc: use global table_{min,max} values instead of local ones Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11879	2021-04-19 15:22:57 -07:00
наб	e5c4f86e7a	linux/spl: base proc_dohostid() on proc_dostring() This fixes /proc/sys/kernel/spl/hostid on kernels with mainline commit 32927393dc1ccd60fb2bdc05b9e8e88753761469 ("sysctl: pass kernel pointers to ->proc_handler") ‒ 5.7-rc1 and up The access_ok() check in copy_to_user() in proc_copyout_string() would always fail, so all userspace reads and writes would fail with EINVAL proc_dostring() strips only the final new-line, but simple_strtoul() doesn't actually need a back-trimmed string ‒ writing "012345678 \n" is still allowed, as is "012345678zupsko", &c. This alters what happens when an invalid value is written ‒ previously it'd get set to what-ever simple_strtoul() returned (probably 0, thereby resetting it to default), now it does nothing Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11878 Closes #11879	2021-04-19 15:22:57 -07:00
Paul Dagnelie	d682e20ba4	Add SIGSTOP and SIGTSTP handling to issig This change adds SIGSTOP and SIGTSTP handling to the issig function; this mirrors its behavior on Solaris. This way, long running kernel tasks can be stopped with the appropriate signals. Note that doing so with ctrl-z on the command line doesn't return control of the tty to the shell, because tty handling is done separately from stopping the process. That can be future work, if people feel that it is a necessary addition. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Issue #810 Issue #10843 Closes #11801	2021-04-19 15:12:33 -07:00
Mateusz Guzik	2aa0d643fd	FreeBSD: use vnlru_free_vfsops if available Fixes issues when zfs is used along with other filesystems. External-issue: https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11881	2021-04-14 13:23:08 -07:00
Mateusz Guzik	a6b82cc0bb	FreeBSD: add missing seqc write begin/end around zfs_acl_chown_setattr It happens to trip over an assert but does not matter for correctness at this time. Done for future proofing. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11884	2021-04-14 13:23:08 -07:00
Mateusz Guzik	4568b5cfba	FreeBSD: add support for lockless symlink lookup Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11883	2021-04-14 13:23:08 -07:00
TerraTech	30f5b2fbe2	zpl_inode.c: Fix SMACK interoperability SMACK needs to have the ZFS dentry security field setup before SMACK's d_instantiate() hook is called as it requires functioning '__vfs_getxattr()' calls to properly set the labels. Fxes: 1) file instantiation properly setting the object label to the subject's label 2) proper file labeling in a transmutable directory Functions Updated: 1) zpl_create() 2) zpl_mknod() 3) zpl_mkdir() 4) zpl_symlink() External-issue: https://github.com/cschaufler/smack-next/issues/1 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: TerraTech <TerraTech@users.noreply.github.com> Closes #11646 Closes #11839	2021-04-14 13:19:50 -07:00
Andrea Gelmini	ca7af7f675	Fix various typos Correct an assortment of typos throughout the code base. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #11774	2021-04-07 13:27:11 -07:00
Ryan Moeller	003f2d04b6	FreeBSD: Fix stable/12 after AT_BENEATH removal Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11827	2021-04-07 13:25:20 -07:00
Luis Henriques	2037edbdaa	Fix error code on __zpl_ioctl_setflags() Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the CAP_LINUX_IMMUTABLE capability. This was detected by generic/545 test in the fstest suite. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Luis Henriques <henrix@camandro.org> Closes #11791	2021-03-26 10:46:45 -07:00
Andrea Gelmini	8a915ba1f6	Removed duplicated includes Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #11775	2021-03-22 12:34:58 -07:00
Brian Atkinson	f52124dce8	Removing old code for k(un)map_atomic It used to be required to pass a enum km_type to kmap_atomic() and kunmap_atomic(), however this is no longer necessary and the wrappers zfs_k(un)map_atomic removed these. This is confusing in the ABD code as the struct abd_iter member iter_km no longer exists and the wrapper macros simply compile them out. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #11768	2021-03-19 22:38:44 -07:00
Coleman Kane	ffd6978ef5	Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs() function that implements the typical MIN(x,y) logic used throughout the kernel for bounding the allocation, and also the new implementation is intended to be signed-safe (which the former was not). Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11765	2021-03-19 22:33:42 -07:00
Coleman Kane	e2a8296131	Linux 5.12 compat: idmapped mounts In Linux 5.12, the filesystem API was modified to support ipmapped mounts by adding a "struct user_namespace *" parameter to a number functions and VFS handlers. This change adds the needed autoconf macros to detect the new interfaces and updates the code appropriately. This change does not add support for idmapped mounts, instead it preserves the existing behavior by passing the initial user namespace where needed. A subsequent commit will be required to add support for idmapped mounted. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11712	2021-03-19 21:00:59 -07:00
Mateusz Guzik	2f385c913f	FreeBSD: make seqc asserts conditional on replay Avoids tripping on asserts when doing pool recovery. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11739	2021-03-17 22:09:45 -07:00

1 2 3 4 5 ...

465 Commits