Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Paul Dagnelie	21adfb031c	Fix kernel panic induced by redacted send In the redaction list traversal code, there is a bug in the binary search logic when looking for the resume point. Maxbufid can be decremented to -1, causing us to read the last possible block of the object instead of the one we wanted. This can cause incorrect resume behavior, or possibly even a hang in some cases. In addition, when examining non-last blocks, we can treat the block as being the same size as the last block, causing us to miss entries in the redaction list when determining where to resume. Finally, we were ignoring the case where the resume point was found in the buffer being searched, and resuming from minbufid. All these issues have been corrected, and the code has been significantly simplified to make future issues less likely. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11297	2020-12-23 14:34:59 -08:00
Ryan Moeller	ae2cfdf8a7	FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes py-sysctl to format it as a bytearray when it should be an integer. "U" is not a valid format, it should be "I" and the type should match the variable type, int. We can return EINVAL if the value is set below zero. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11318	2020-12-23 14:34:59 -08:00
Brian Behlendorf	f217a2b902	Fix possibly uninitialized 'root_inode' variable warning Resolve an uninitialized variable warning when compiling. In function ‘zfs_domount’: warning: ‘root_inode’ may be used uninitialized in this function [-Wmaybe-uninitialized] sb->s_root = d_make_root(root_inode); Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11306	2020-12-23 14:34:59 -08:00
Ryan Moeller	fb3ad5d24e	FreeBSD: Do zcommon_init sooner to avoid FPU panic There has been a panic affecting some system configurations where the thread FPU context is disturbed during the fletcher 4 benchmarks, leading to a panic at boot. module_init() registers zcommon_init to run in the last subsystem (SI_SUB_LAST). Running it as soon as interrupts have been configured (SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks before we start doing other things. While it's not clear how the FPU context was being disturbed, this does seem to avoid it. Add a module_init_early() macro to run zcommon_init() at this earlier point on FreeBSD. On Linux this is defined as module_init(). Authored by: Konstantin Belousov <kib@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11302	2020-12-23 14:34:59 -08:00
Brian Behlendorf	038aaec1cd	Fix optional "force" arg handing in zfs_ioc_pool_sync() The fnvlist_lookup_boolean_value() function should not be used to check the force argument since it's optional. It may not be provided or may have been created with the wrong flags. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11281 Closes #11284	2020-12-23 14:34:59 -08:00
Brian Behlendorf	07ca433973	Reduce fletcher4 and raidz benchmark times During module load time all of the available fetcher4 and raidz implementations are benchmarked for a fixed amount of time to determine the fastest available. Manual testing has shown that this time can be significantly reduced with negligible effect on the final results. This commit changes the benchmark time to 1ms which can reduce the module load time by over a second on x86_64. On an x86_64 system with sse3, ssse3, and avx2 instructions the benchmark times are: Fletcher4 603ms -> 15ms RAIDZ 1,322ms -> 64ms Reviewed-by: Matthew Macy <mmacy@freebsd.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11282	2020-12-23 14:34:59 -08:00
Alexander Motin	1d02bdee6c	Fix for "Reduce latency effects of non-interactive I/O" It was found that setting min_active tunables for non-interactive I/Os makes them stuck. It is caused by zfs_vdev_nia_delay, that can never be reached if we never issue any I/Os due to min_active set to zero. Fix this by issuing at least one non-interactive I/O at a time when there are no interactive I/Os. When there are interactive I/Os, zero min_active allows to completely block any non-interactive I/O. It may min_active starvation in some scenarios, but who we are to deny foot shooting? Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #11261	2020-12-23 13:09:17 -08:00
Alexander Motin	2080c4f27e	Reduce latency effects of non-interactive I/O Investigating influence of scrub (especially sequential) on random read latency I've noticed that on some HDDs single 4KB read may take up to 4 seconds! Deeper investigation shown that many HDDs heavily prioritize sequential reads even when those are submitted with queue depth of 1. This patch addresses the latency from two sides: - by using _min_active queue depths for non-interactive requests while the interactive request(s) are active and few requests after; - by throttling it further if no interactive requests has completed while configured amount of non-interactive did. While there, I've also modified vdev_queue_class_to_issue() to give more chances to schedule at least _min_active requests to the lowest priorities. It should reduce starvation if several non-interactive processes are running same time with some interactive and I think should make possible setting of zfs_vdev_max_active to as low as 1. I've benchmarked this change with 4KB random reads from ZVOL with 16KB block size on newly written non-fragmented pool. On fragmented pool I also saw improvements, but not so dramatic. Below are log2 histograms of the random read latency in milliseconds for different devices: 4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before: 0, 0, 2, 1, 12, 21, 19, 18, 10, 15, 17, 21 after: 0, 0, 0, 24, 101, 195, 419, 250, 47, 4, 0, 0 , that means maximum latency reduction from 2s to 500ms. 4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before: 0, 0, 2, 31, 38, 28, 18, 12, 17, 20, 24, 10, 3 after: 0, 0, 55, 247, 455, 470, 412, 181, 36, 0, 0, 0, 0 , i.e. from 4s to 250ms. 1 SAS HDD SEAGATE ST14000NM0048 before: 0, 0, 29, 70, 107, 45, 27, 1, 0, 0, 1, 4, 19 after: 1, 29, 681, 1261, 676, 1633, 67, 1, 0, 0, 0, 0, 0 , i.e. from 4s to 125ms. 1 SAS SSD SEAGATE XS3840TE70014 before (microseconds): 0, 0, 0, 0, 0, 0, 0, 0, 70, 18343, 82548, 618 after: 0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844, 90 I've also measured scrub time during the test and on idle pools. On idle fragmented pool I've measured scrub getting few percent faster due to use of QD3 instead of QD2 before. On idle non-fragmented pool I've measured no difference. On busy non-fragmented pool I've measured scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #11166	2020-12-23 13:09:03 -08:00
Ryan Moeller	7735c9addf	FreeBSD: notify userspace when a vdev is removed This is needed for zfsd to autoreplace vdevs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11260	2020-12-23 13:08:12 -08:00
Brian Behlendorf	2c36eb763f	Revert "Reduce latency effects of non-interactive I/O" Under certain conditions commit `a3a4b8def` appears to result in a hang, or poor performance, when importing a pool. Until the root cause can be identified it has been reverted from the release branch. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #11245	2020-11-30 09:43:09 -08:00
Alexander Motin	a3a4b8def7	Reduce latency effects of non-interactive I/O Investigating influence of scrub (especially sequential) on random read latency I've noticed that on some HDDs single 4KB read may take up to 4 seconds! Deeper investigation shown that many HDDs heavily prioritize sequential reads even when those are submitted with queue depth of 1. This patch addresses the latency from two sides: - by using _min_active queue depths for non-interactive requests while the interactive request(s) are active and few requests after; - by throttling it further if no interactive requests has completed while configured amount of non-interactive did. While there, I've also modified vdev_queue_class_to_issue() to give more chances to schedule at least _min_active requests to the lowest priorities. It should reduce starvation if several non-interactive processes are running same time with some interactive and I think should make possible setting of zfs_vdev_max_active to as low as 1. I've benchmarked this change with 4KB random reads from ZVOL with 16KB block size on newly written non-fragmented pool. On fragmented pool I also saw improvements, but not so dramatic. Below are log2 histograms of the random read latency in milliseconds for different devices: 4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before: 0, 0, 2, 1, 12, 21, 19, 18, 10, 15, 17, 21 after: 0, 0, 0, 24, 101, 195, 419, 250, 47, 4, 0, 0 , that means maximum latency reduction from 2s to 500ms. 4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before: 0, 0, 2, 31, 38, 28, 18, 12, 17, 20, 24, 10, 3 after: 0, 0, 55, 247, 455, 470, 412, 181, 36, 0, 0, 0, 0 , i.e. from 4s to 250ms. 1 SAS HDD SEAGATE ST14000NM0048 before: 0, 0, 29, 70, 107, 45, 27, 1, 0, 0, 1, 4, 19 after: 1, 29, 681, 1261, 676, 1633, 67, 1, 0, 0, 0, 0, 0 , i.e. from 4s to 125ms. 1 SAS SSD SEAGATE XS3840TE70014 before (microseconds): 0, 0, 0, 0, 0, 0, 0, 0, 70, 18343, 82548, 618 after: 0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844, 90 I've also measured scrub time during the test and on idle pools. On idle fragmented pool I've measured scrub getting few percent faster due to use of QD3 instead of QD2 before. On idle non-fragmented pool I've measured no difference. On busy non-fragmented pool I've measured scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #11166	2020-11-25 08:45:38 -08:00
Matthew Macy	45061cc797	FreeBSD: decouple ZFS_DEBUG from kernel debug settings Reviewed-by: Martelli Nikola @martellini Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11213	2020-11-25 08:42:26 -08:00
Brian Behlendorf	ccfa35c6f9	Correct missing zil_claim() DTL updates Commit `a1d477c2` accidentally disabled DTL updates for the zil_claim() case described at the end of vdev_stat_update() by unconditionally disabling all DTL updates when loading. This was done to avoid a deadlock on the vd_dtl_lock when loading the DTLs from disk. vdev_dtl_contains <--- Takes vd->vd_dtl_lock vdev_mirror_child_missing vdev_mirror_io_start zio_vdev_io_start __zio_execute arc_read dbuf_issue_final_prefetch dbuf_prefetch_impl dbuf_prefetch dmu_prefetch space_map_iterate space_map_load_length space_map_load vdev_dtl_load <--- Takes vd->vd_dtl_lock vdev_load spa_ld_load_vdev_metadata spa_tryimport The missing DTL updates can be restored by moving the space_map_load() call outside the vd_dtl_lock. A private range tree is populated by reading the space map and then merged in to the DTL_MISSING tree under the lock. Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an additional problem. Any resilvering which occurs before SPA_LOAD_NONE is set will incorrectly determine that there's nothing to repair. This can result in full redundancy not being restored for some blocks. Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11218	2020-11-22 10:01:43 -08:00
Matthew Macy	043ef5c25e	Fix problems in zvol_set_volmode_impl - Don't leave fstrans set when passed a snapshot - Don't remove minor if volmode already matches new value - (FreeBSD) Wait for GEOM ops to complete before trying remove (at create time GEOM will be "tasting" in parallel) - (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL - (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11199	2020-11-17 12:20:09 -08:00
loli10K	f6f3089cf6	Fix 'zfs userspace' for received datasets in encrypted root For encrypted receives, where user accounting is initially disabled on creation, both 'zfs userspace' and 'zfs groupspace' fails with EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a successful dmu_objset_space_upgrade(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9501 Closes #9596	2020-11-17 12:19:51 -08:00
George Amanakis	a09aeb9fc4	Fix ASSERT logic in l2arc_evict() In case of cache device removal it is possible that at the end of l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the following panic in case of a debug build: VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512) Call Trace: dump_stack+0x66/0x90 spl_panic+0xef/0x117 [spl] l2arc_remove_vdev+0x11d/0x290 [zfs] spa_load_l2cache+0x275/0x5b0 [zfs] spa_vdev_remove+0x4a5/0x6e0 [zfs] zfs_ioc_vdev_remove+0x59/0xa0 [zfs] zfsdev_ioctl_common+0x5b3/0x630 [zfs] zfsdev_ioctl+0x53/0xe0 [zfs] do_vfs_ioctl+0x42e/0x6b0 ksys_ioctl+0x5e/0x90 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In case of cache device removal it also possible that l2ad_hand + distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand is not reset. This has no functional consequence however as the cache device is about to be removed. Fix this by omitting the ASSERT in case of device removal. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #11205	2020-11-17 12:19:46 -08:00
Brian Behlendorf	d02fc15ba1	Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used in the Linux zpl_*.c source files. They return a positive error value which is correct for the common code, but not for the Linux specific kernel code which expects a negative return value. The ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead. Furthermore, the ZPL_EXIT macro has been updated to not call the zfs_exit_fs() function. This prevents a possible deadlock which can occur when a snapshot is automatically unmounted because the zpl_show_devname() must never wait on in progress automatic snapshot unmounts. Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11169 Closes #11201	2020-11-14 10:51:27 -08:00
Matthew Ahrens	435dc4baab	Assertion failure when logging large output of channel program The output of ZFS channel programs is logged on-disk in the zpool history, and printed by `zpool history -i`. Channel programs can use 10MB of memory by default, and up to 100MB by using the `zfs program -m` flag. Therefore their output can be up to some fraction of 100MB. In addition to being somewhat wasteful of the limited space reserved for the pool history (which for large pools is 1GB), in extreme cases this can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in `dmu_buf_hold_array_by_dnode()`. This commit limits the output size that will be logged to 1MB. Larger outputs will not be logged, instead a entry will be logged indicating the size of the omitted output. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11194	2020-11-14 10:51:21 -08:00
Matthew Ahrens	4a87c280dc	Channel program may spuriously fail with "memory limit exhausted" ZFS channel programs (invoked by `zfs program`) are executed in a LUA sandbox with a limit on the amount of memory they can consume. The limit is 10MB by default, and can be raised to 100MB with the `-m` flag. If the memory limit is exceeded, the LUA program exits and the command fails with a message like `Channel program execution failed: Memory limit exhausted.` The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which will fail if the requested memory is not immediately available. In this case, the program fails with the same message, `Memory limit exhausted`. However, in this case the specified memory limit has not been reached, and the memory may only be temporarily unavailable. This commit changes the LUA memory allocator `zcp_lua_alloc()` to use `vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is temporarily low. Instead, we rely on the system to be able to free up memory (e.g. by evicting from the ARC), and we assume that even at the highest memory limit of 100MB, the channel program will not truly exhaust the system's memory. External-issue: DLPX-71924 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11190	2020-11-12 09:02:00 -08:00
Brian Behlendorf	d237d9a918	Linux: Fix mount/unmount when dataset name has a space The custom zpl_show_devname() helper should translate spaces in to the octal escape sequence \040. The getmntent(2) function is aware of this convention and properly translates the escape character back to a space when reading the fsname. Without this change the `zfs mount` and `zfs unmount` commands incorrectly detect when a dataset with a name containing spaces is mounted. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11182 Closes #11187	2020-11-12 09:01:55 -08:00
Tony Perkins	2132ae465d	Start snapdir_iterate traversals to begin wtih the value of zero. The microzap hash can sometimes be zero for single digit snapnames. The zap cursor can then have a serialized value of two (for . and ..), and skip the first entry in the avl tree for the .zfs/snapshot directory listing, and therefore does not return all snapshots. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: Tony Perkins <tperkins@datto.com> Closes #11039	2020-11-12 09:01:27 -08:00
Mateusz Guzik	87f01fc158	G/C data_alloc_arena It is a leftover from illumos always set to NULL and introducing a spurious difference between zio_buf and zio_data_buf. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11188	2020-11-11 18:46:22 -08:00
Mateusz Guzik	995b80fa3a	G/C struct znode -> z_moved The field is yet another leftover from unsupported zfs_znode_move. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11186	2020-11-11 11:40:15 -08:00
Ryan Moeller	cb4d3fb737	FreeBSD: Simplify zvol_geom_open and zvol_cdev_open We can consolidate the unlocking procedure into one place by starting with drop_suspend set to B_FALSE and moving the open count check up. While here, a little code cleanup. Match the out labels between zvol_geom_open and zvol_cdev_open, and add a missing period in some comments. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11175	2020-11-11 11:07:40 -08:00
Ryan Moeller	4a2e9811e9	FreeBSD: Avoid spurious EINTR in zvol_cdev_open zvol_first_open can fail with EINTR if spa_namespace_lock is not held and cannot be taken without waiting. Apply the same logic that was done for zvol_geom_open to take spa_namespace_lock if not already held on first open in zvol_cdev_open. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11175	2020-11-11 11:07:40 -08:00
Alexander Motin	050dfc5045	Fix dmu_tx_dirty_throttle after arc_c reduction After initial arc_c was reduced to arc_c_min it became possible that on datasets with primarycache=metadata or none dirty data make up most of ARC capacity and easily more than configured 50% of initial arc_c, that causes forced txg commits by arc_tempreserve_space() and periodic very long write delays. This patch makes arc_tempreserve_space() to use arc_c only after ARC warmed up once and arc_c really means something, but use arc_c_max before that. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #11178	2020-11-11 11:03:43 -08:00
Matthew Macy	b49118220c	Fix dnode refcount tracking Fix a couple of places where the wrong tag is passed to dnode_{hold, rele} Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11184	2020-11-11 11:03:24 -08:00
Mariusz Zaborski	957b4e9fbd	FreeBSD: Prevent a NULL reference in zvol_cdev_open Check if the ZVOL has been written before calling zil_async_to_sync. The ZIL will be opened on the first write, not earlier. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org> Closes #11152	2020-11-11 11:00:31 -08:00
khng300	ef648fec0e	FreeBSD: Prevent NULL pointer dereference of resid spa_config_load() passes NULL into resid when doing zfs_file_read(). This would trip over when vfs.zfs.autoimport_disable=0. Sponsored by: The FreeBSD Foundation Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org> Closes #11149	2020-11-11 11:00:19 -08:00
Ryan Moeller	62d549d757	FreeBSD: zvol_os: Use SET_ERROR more judiciously SET_ERROR is useful to trace errors, so use it where the errors occur rather than factored out to the end of a function. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11146	2020-11-03 09:51:49 -08:00
Coleman Kane	a30fed54f4	Linux 5.10 compat: revalidate_disk_size() added A new function was added named revalidate_disk_size() and the old revalidate_disk() appears to have been deprecated. As the only ZFS code that calls this function is zvol_update_volsize, swapping the old function call out for the new one should be all that is required. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11085	2020-11-03 09:51:31 -08:00
Coleman Kane	e767b1cacc	Linux 5.10 compat: check_disk_change() removed Kernel 5.10 removed check_disk_change() in favor of callers using the faster bdev_check_media_change() instead, and explicitly forcing bdev revalidation when they desire that behavior. To preserve prior behavior, I have wrapped this into a zfs_check_media_change() macro that calls an inline function for the new API that mimics the old behavior when check_disk_change() doesn't exist, and just calls check_disk_change() if it exists. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11085	2020-11-03 09:51:26 -08:00
Coleman Kane	d2090becab	Linux 5.10 compat: percpu_ref added data member Kernel commit 2b0d3d3e4fcfb brought in some changes to the struct percpu_ref structure that moves most of its fields into a member struct named "data" of type struct percpu_ref_data. This includes the "count" member which is updated by vdev_blkg_tryget(), so update this function to chase the API change, and detect it via configure. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Coleman Kane <ckane@colemankane.org> Closes #11085	2020-11-03 09:51:20 -08:00
Sebastian Gottschall	f8460e7e62	Optimize locking checks in mempool allocator Avoid checking the whole array of objects each time by removing the self organized memory reaping. this can be managed by the global memory reap callback which is called every 60 seconds. this will reduce the use if locking operations significant. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Closes #11126	2020-11-03 09:51:10 -08:00
Ryan Moeller	aaeffd09bf	zvol_os: Fix handling of zvol private data zvol private data is supposed to be nulled by zvol_clear_private before zvol_free is called as an indicator that the zvol is going away. Implement zvol_clear_private for volmode=dev. Assert that zvol_clear_private has been called before zvol_free. Check that zvol_clear_private has not been called when updating volsize. If it has, fail with ENXIO. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 16:06:54 -07:00
Ryan Moeller	0c270bb6c4	zvol_os: Don't leak doi in cdev error path Make sure to free doi in zvol_create_minor impl when make_dev_s fails. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 16:06:49 -07:00
Ryan Moeller	c2c643256c	zvol_os: Properly ignore error in volmode lookup We fall back to a default volmode and continue when looking up a zvol's volmode property fails. After this we should set the error to 0 to ensure we take the success paths in the out section. While here, make sure we only log that the zvol was created on success. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 16:06:42 -07:00
Ryan Moeller	896d0f0906	zvol_os: Code cleanup in zvol_create_minor_impl Nonfunctional changes for readability and consistency. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 16:06:36 -07:00
Ryan Moeller	ef525e0841	zvol_os: Keep better track of open count in close zvol_geom_close gets a count of the number of close operations to do. Make sure we're always using this count to check if this will be the last close operation performed on the zvol. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 16:06:30 -07:00
Ryan Moeller	00a27515f0	zvol_os: Tidy up asserts Using more specific assert variants gives better messages on failure. No functional change. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11117	2020-10-30 16:06:24 -07:00
Mateusz Guzik	52f1ef3b2d	zstd: track allocator statistics Note that this only tracks sizes as requested by the caller. Actual allocated space will almost always be bigger (e.g., rounded up to the next power of 2 or page size). Additionally the allocated buffer may be holding other areas hostage. Nonetheless, this is a starting point for tracking memory usage in zstd. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11129	2020-10-30 16:06:15 -07:00
Attila Fülöp	c6b0680d9b	ICP: gcm: Allocate hash subkey table separately While evaluating other assembler implementations it turns out that the precomputed hash subkey tables vary in size, from 816 bytes (avx2/avx512) up to 4816 bytes (avx512-vaes), depending on the implementation. To be able to handle the size differences later, allocate `gcm_Htable` dynamically rather then having a fixed size array, and adapt consumers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #11102	2020-10-30 16:06:09 -07:00
Attila Fülöp	2c37e1416b	Add some missing cfi frame info in aesni-gcm-x86_64.S While preparing #9749 some .cfi_{start,end}proc directives were missed. Add the missing ones. See upstream https://github.com/openssl/openssl/commit/275a048f Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #11101	2020-10-30 16:06:00 -07:00
Mateusz Guzik	6e4845aee3	FreeBSD: catch up with 1300124 version bump - use cache_vop_mkdir - cache_rename -> cache_vop_rename Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11136	2020-10-30 16:05:18 -07:00
Ryan Moeller	48cf7d674a	FreeBSD: Fix 12.2-STABLE after AT_BENEATH MFC AT_BENEATH was merged to stable/12, where kern_unlinkat takes a non-const path. DECONST the path passed to kern_unlinkat in the case where AT_BENEATH is defined. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11139	2020-10-30 16:05:10 -07:00
Alexander Motin	ca54e52122	Yield periodically when rebuilding L2ARC L2ARC devices of several terabytes filled with 4KB blocks may take 15 minutes to rebuild. Due to the way L2ARC log reading is implemented it is quite likely that for all that time rebuild thread will never sleep. At least on FreeBSD kernel threads have absolute priority and can not be preempted by threads with lower priorities. If some thread is also bound to that specific CPU it may not get any CPU time for all the 15 minutes. Reviewed-by: Cedric Berger <cedric@precidata.com> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #11116	2020-10-30 16:04:53 -07:00
Ryan Moeller	c3ae9321bf	Update references to nonexistent man pages in code Refer to the correct section or alternative for FreeBSD and Linux. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11132	2020-10-30 16:04:41 -07:00
Alexander Motin	2dd2e49cc7	FreeBSD: Remove BIO_ORDERED flag from BIO_FLUSH ZFS always waits for the write completion before flushing the cache. That is why it does not require explicit ordering fences around it, which are pretty difficult to implement for NVMe, since one has no internal concept of strict request ordering. This was already removed from FreeBSD once, but got resurrected by mistake during OpenZFS merge. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #11130	2020-10-30 16:04:32 -07:00
Mateusz Guzik	7e76d21bc8	Linux: g/c leftover fence in zfs_znode_alloc The port removed provisions for zfs_znode_move but the cleanup missed this bit. To quote the original: [snip] list_insert_tail(&zfsvfs->z_all_znodes, zp); membar_producer(); /* * Everything else must be valid before assigning z_zfsvfs makes the * znode eligible for zfs_znode_move(). */ zp->z_zfsvfs = zfsvfs; [/snip] In the current code it is immediately followed by unlock which issues the same fence, thus plays no role in correctness. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11115	2020-10-30 16:04:05 -07:00
Mateusz Guzik	e579a4ed0f	FreeBSD: g/c unused zfs_znode_move support The allocator does not provide the functionality to begin with. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11114	2020-10-30 16:03:58 -07:00
Brian Behlendorf	6867d00403	Use known license string for zlua The Linux kernel MODULE_LICENSE macro only recognizes a handful of license strings and "MIT" is not one of the them. Update the macro to use "Dual MIT/GPL" which is recognized and what the kernel expects MIT licensed modules to use. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11112 Closes #11113	2020-10-30 16:03:37 -07:00
Ryan Moeller	0dc6fb730f	FreeBSD: Skip RAW kstat sysctls by default These kstats are often expensive to compute so we want to avoid them unless specifically requested. The following kstats are affected by this change: kstat.zfs.${pool}.multihost kstat.zfs.${pool}.misc.state kstat.zfs.${pool}.txgs kstat.zfs.misc.fletcher_4_bench kstat.zfs.misc.vdev_raidz_bench kstat.zfs.misc.dbufs kstat.zfs.misc.dbgmsg In FreeBSD 13, sysctl(8) has been updated to still list the names/description/type of skipped sysctls so they are still discoverable. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11099	2020-10-30 16:03:22 -07:00
Mateusz Guzik	f5bffd3748	FreeBSD: catch up with 1300123 version bump - removed thread argument from VOP_INACTIVE - removed cred argument from VOP_VPTOCNP Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11104	2020-10-30 16:03:13 -07:00
Ryan Moeller	73511e3dde	Add missing zfs_arc_evict_batch_limit tunable It's even documented already. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11094	2020-10-30 16:02:24 -07:00
Kyle Evans	4df31aa98c	Makefile.bsd: remove directory that no longer exists This was removed in a reorganization of directories preparing for the merge of FreeBSD support, `006e9a4088` by mmacy. While llvm is perfectly happy with the nonexistent -I directory, the gcc6 and gcc9 we can elect to use as cross-toolchains both trip over it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes #11077	2020-10-30 15:57:46 -07:00
Matthew Macy	aeeada355c	FreeBSD: delete unreferenced file zfs_onexit_os.c was not deleted when it was removed from the build Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #11079	2020-10-30 15:57:15 -07:00
Mateusz Guzik	bd565f3e24	FreeBSD: add missing fplookup_vexec handler to special vop vectors Otherwise lookup can fail with EOPNOTSUPP or panic. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11066	2020-10-16 13:05:43 -07:00
Mateusz Guzik	3c4e580e9a	FreeBSD: g/c unused vop vector zfsctl_ops_shares_dir Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11066	2020-10-16 13:05:39 -07:00
Don Brady	b3f4436d37	Ignore special vdev ashift for spa ashift min/max The removal of a vdev in the normal class would fail if there was a special or deup vdev that had a different ashift than the vdevs in the normal class. Moved the initialization of spa_min_ashift / spa_max_ashift from vdev_open so that it occurs after the vdev allocation bias was initialized (i.e. after vdev_load). Caveat -- In order to remove a special/dedup vdev it must have the same ashift as the normal pool vdevs. This could perhaps be lifted in the future (i.e. for the case where there is ample space in any surviving special class vdevs) Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #9363 Closes #9364 Closes #11053	2020-10-16 13:05:34 -07:00
Christian Schwarz	05f8be3b49	Fix crash caused by invalid snapshot names in redactnvl This is a follow up fix for commit `0fdd6106bb`. The VERIFY is only true when we haven't hit an error code path. See added test case for a reproducer. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #11048	2020-10-16 13:05:28 -07:00
Paul Dagnelie	d8091c9294	Fix incorrect deletion order in range_tree_add_impl gap case After a side-effectful call like add or remove, references to range segs stored in btrees can no longer be used safely. We move the remove call to just before the reinsertion call so that the seg remains valid for as long as we need it. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11044 Closes #11056	2020-10-16 13:05:23 -07:00
Mateusz Guzik	05613fa7a3	FreeBSD: fix panic due to tqid overflow The 32-bit counter eventually wraps to 0 which is a sentinel for invalid id. Make it 64-bit on LP64 platforms and 0-check otherwise. Note: Linux counterpart uses id stored per queue instead of a global. I did not check going that way is feasible with the goal being the minimal fix doing the job. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11059	2020-10-16 13:05:18 -07:00
Ryan Moeller	725c9e22ca	Cross-platform acltype The acltype property is currently hidden on FreeBSD and does not reflect the NFSv4 style ZFS ACLs used on the platform. This makes it difficult to observe that a pool imported from FreeBSD on Linux has a different type of ACL that is being ignored, and vice versa. Add an nfsv4 acltype and expose the property on FreeBSD. Make the default acltype nfsv4 on FreeBSD. Setting acltype to an unhanded style is treated the same as setting it to off. The ACLs will not be removed, but they will be ignored. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10520	2020-10-16 13:05:00 -07:00
Warner Losh	fbfc7e843a	FreeBSD: make adjustments for the standalone environment In FreeBSD, there are three compile environments that are supported: user land, the kernel and the bootloader / standalone. Adjust the headers to compile in the standalone environment. Limit kernel-only items from view when _STANDALONE is defined. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Warner Losh <imp@FreeBSD.org> Closes #10998	2020-10-16 13:04:41 -07:00
Warner Losh	faa62966b1	aarch64: Use proper guards for NEON instructions The zstd code assumes that if you are on aarch64, you have NEON instructions. This is not necessarily true. In a boot loader, where you might not have the VFP properly initialized, these instructions may not be available. It's also an error to include arm_neon.h when the NEON insturctions aren't enabled. Change the guards for using the NEON instructions from __aarch64__ to __ARM_NEON which is the standard symbol for knowing if they are available. __ARM_NEON is the proper symbol, defined in ARM C Language Extensions Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some sources suggest __ARM_NEON__, but that's the obsolete spelling from prior versions of the standard. Updated based on zstd pull request https://github.com/facebook/zstd/pull/2356 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Warner Losh <imp@bsdimp.com> Closes #11055	2020-10-16 13:03:13 -07:00
Mateusz Guzik	7f0b3fa042	FreeBSD: use cache_rename if available Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #11045	2020-10-16 13:03:00 -07:00
Ryan Moeller	c71847b77b	Expose zfetch_max_idistance tunable FreeBSD had this value tunable before the switch to the new OpenZFS. The tunable name has changed, breaking legacy compat. Restore legacy compat for this tunable, properly expose the tunable with the new name on all platforms, and document it in zfs-module-parameters(5). While here, clean up the documentation for zfetch_max_distance a bit. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11038	2020-10-16 13:02:39 -07:00
Christian Schwarz	5c6d3c21b1	zil_parse: make callback parameters const Code cleanup, a follow up commit to `4d55ea81`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Ryan Moeller <ryan@freqlabs.com> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #11020	2020-10-16 13:01:53 -07:00
Ryan Moeller	5e7198b873	Linux: Initialize zp in zfs_setattr_dir The value of zp is used without having been initialized under some conditions. Initialize the pointer to NULL. Add a regression test case using chown in acl/posix. However, this is not enough because the setup sets xattr=sa, which means zfs_setattr_dir will not be called. Create a second group of acl tests in acl/posix-sa duplicating the acl/posix tests with symlinks, and remove xattr=sa from the original acl/posix tests. This provides more coverage for the default xattr=on code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10043 Closes #11025	2020-10-16 13:01:29 -07:00
Brian Behlendorf	46c71074ca	Replace ZFS on Linux references with OpenZFS This change updates the documentation to refer to the project as OpenZFS instead ZFS on Linux. Web links have been updated to refer to https://github.com/openzfs/zfs. The extraneous zfsonlinux.org web links in the ZED and SPL sources have been dropped. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #11007	2020-10-16 13:01:24 -07:00
Jacob Adams	35ba2ca5b7	Fix Linux modules uninstall A missing semicolon between kmoddir variable declaration and the uninstall for loop caused modules_uninstall-Linux to fail with: Syntax error: "do" unexpected Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Jacob Adams <jacob@tookmund.com> Closes #11032	2020-10-16 13:01:14 -07:00
Chuck Tuffli	0df5b5737c	Fix ubsan: shift exponent is too large When running libzpool with the Undefined Behavior Sanitizer (ubsan) enabled, a zpool create causes a run-time error: module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is too large for 64-bit type 'long long unsigned int'` in vdev_config_generate() Fix is to convert vdev_removal_max_span to its base-2 logarithm, using highbit64(), and then compare the "shifts". Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Chuck Tuffli <ctuffli@gmail.com> Closes #9744 Closes #11024	2020-10-16 13:00:44 -07:00
Ryan Moeller	25e44a17ff	Make dbufstat work on FreeBSD With procfs_list kstats implemented for FreeBSD, dbufs are now exposed as kstat.zfs.misc.dbufs. On FreeBSD, dbufstats can use the sysctl instead of procfs when no input file has been given. Enable the dbufstats tests on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11008	2020-10-16 13:00:28 -07:00
Ryan Moeller	718d20ed93	FreeBSD: Sort and dedup includes in kmod_core Code cleanup. Sort includes, remove duplicates, and drop some extra blank lines in kmod_core.c. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #11000	2020-10-16 13:00:23 -07:00
Ryan Moeller	106627caa7	FreeBSD: Sort out kernel FPU headers for 12.1-REL We were missing an include for kernel FPU functions, breaking the build on FreeBSD 12.1-RELEASE. This was apparently being pulled in from elsewhere on stable/12 and head. Sorted the other includes in these files while here. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11005	2020-10-16 12:59:09 -07:00
Ryan Moeller	47e3dba972	Throw const on some strings In C, const indicates to the reader that mutation will not occur. It can also serve as a hint about ownership. Add const in a few places where it makes sense. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #10997	2020-10-16 12:55:56 -07:00
John Poduska	a09e3a8594	Mismatched nvlist names in zfs_keys_send_space This causes "zfs send -vt ..." to fail with: cannot resume send: Unknown error 1030 It turns out that some of the name/value pairs in the verification list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong name, so the ioctl got kicked out in zfs_check_input_nvpairs(). Update the names accordingly. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #10978	2020-10-16 12:55:19 -07:00
Brian Behlendorf	fc5966589b	Fix buggy procfs_list_seq_next warning The kernel seq_read() helper function expects ->next() to update the passed position even there are no more entries. Failure to do so results in the following warning being logged. seq_file: buggy .next function procfs_list_seq_next [spl] did not update position index Functionally there is no issue with the way procfs_list_seq_next() is implemented and the warning is harmless. However, we want to silence this some what scary incorrect warning. This commit updates the Linux procfs code to advance the position even for the last entry. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10984 Closes #10996	2020-10-01 12:30:28 -07:00
Ryan Moeller	5d61d6e8dd	FreeBSD: Fix legacy compat for platform IOCs The request number is out of bounds of the platform table. Subtract the starting offset to get the correct subscript. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10994	2020-10-01 12:23:00 -07:00
Matthew Macy	775afc4dcd	Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data `dbuf_stats_hash_table_data` can take much longer than it needs to by repeatedly bzeroing its buffer when in fact the buffer only needs to be NULL terminated. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10993	2020-10-01 12:22:54 -07:00
Sebastian Gottschall	7ce9da0bea	do a cyclic seek for unused memory objects in pool In non regular use cases allocated memory might stay persistent in memory pool. This small patch checks every minute if there are old objects which can be released from memory pool. Right now with regular use, the pool is checked for old objects on each allocation attempt from this pool. so basically polling by its use. Now consider what happens if someone writes a lot of files and stops use of the volume or even unmounts it. So the code will no longer check if objects can be released from the pool. Already allocated objects will still stay in pool cache. this is no big issue for common use. But someone discovered this issue while doing tests. personally i know this behavior and I'm aware of it. Its no big issue. just a enhancement Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Closes #10938 Closes #10969	2020-10-01 12:22:48 -07:00
Ryan Moeller	e58dee8cae	Drop references when skipping dmu_send due to EXDEV When an invalid incremental send is requested where the "to" ds is before the "from" ds, make sure to drop the reference to the pool and the dataset before returning the error. Add an assert on FreeBSD to make sure we don't hold any locks after returning from an ioctl. Add some test coverage. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10919	2020-10-01 12:22:36 -07:00
Brian Behlendorf	13c38c4c45	Use known license string for zzstd The Linux kernel MODULE_LICENSE macro only recognizes a handful of license strings and "BSD" is not one of the them. Update the macro to use "Dual BSD/GPL" which is recognized and what the kernel expects BSD licensed module to use. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10982 Closes #10992	2020-10-01 12:22:23 -07:00
Adam D. Moss	edd23dba81	Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c Prefetching of dnodes in dbuf_read() can cause significant mutex contention for some workloads and isn't very helpful. This is because we already get 32 dnodes for each block read, and when iterating over a directory we prefetch the dnodes in the directory. Disable this prefetching to prevent the lock contention. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Submitted-by: Adam Moss <c@yotes.com> Submitted-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Adam Moss <c@yotes.com> Closes #10877 Closes #10953	2020-10-01 12:21:09 -07:00
Brian Behlendorf	7b353d2c8c	Fix PREEMPTION=y and BLK_CGROUP=y config on arm64 With PREEMPTION=y and BLK_CGROUP=y preempt_schedule_notrace() is being used on arm64 which is a GPL-only function and hence the build of the DKMS kernel module fails. Fix that by redefining preempt_schedule_notrace() to preempt_schedule() which should be safe as long as tracing is not used. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Juerg Haefliger <juergh@canonical.com> Closes #8545 Closes #9948 Closes #10416 Closes #10973	2020-10-01 12:20:59 -07:00
Mateusz Guzik	b37efb872b	FreeBSD: update cache_purgevfs usage after 1300117 version bump Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Nick Wolff <darkfiberiru@gmail.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #10970	2020-10-01 12:20:45 -07:00
Ryan Moeller	d8a81c3d3c	FreeBSD: Code cleanup in zio_crypt Address some unused value and control flow issues flagged by Coverity. Unreachable code is pruned and unused values are avoided. Some scattered sections are reordered for coherence. We can assume kmem_alloc(n, KM_SLEEP) doesn't fail, so there is no need to check if it returned NULL. The allocated memory doesn't need to be zeroed, other than the last iovec (the MAC). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10884	2020-10-01 12:20:39 -07:00
Ryan Moeller	875307b6a1	Prune dead branch reported by Coverity wkey is NULL at every `goto error;`. dcp is never NULL. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10884	2020-10-01 12:20:16 -07:00
Christian Schwarz	ba28919168	zfs_log_write: simplify data copying code for WR_COPIED records lr_write_t records that are WR_COPIED have the record data directly appended to them (see lr_write_t type definition). The data is copied from the debuf using dmu_read_by_dnode. This function was called, only for WR_COPIED records, as part of a short-circuiting if-statement's if-expression. I found this side-effectful call to dmu_read_by_dnode pretty hard to spot. This patch improves readability by moving the call to its own line. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #10956	2020-10-01 12:20:00 -07:00
Matthew Macy	c70c6e004e	FreeBSD: Add support for procfs_list The procfs_list interface is required by several kstats. Implement this functionality for FreeBSD to provide access to these kstats. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10890	2020-10-01 12:18:56 -07:00
Matthew Macy	227273efa4	FreeBSD: Don't save user FPU context in kernel threads Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10899	2020-10-01 12:18:51 -07:00
Paul Dagnelie	b199e62d17	Don't set numobjs to UINT64_MAX or near it Resolves an issue with `zfs send` streams from 0.8.4 which prevents them from being received by versions < 0.7. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10911 Closes #10916	2020-10-01 12:18:38 -07:00
Mark Johnston	e651a5b233	Fix a logic bug in the FreeBSD getpages VOP In commit `cd32b4f5b7` ("Fix a deadlock in the FreeBSD getpages VOP") I introduced a bug while porting the patch originally committed to FreeBSD: the rangelock pointer may be NULL if the try operation failed, so we must avoid calling zfs_rangelock_unlock() in that case. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reported-by: Steve Wills <swills@FreeBSD.org> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #10519 Closes #10960	2020-10-01 12:16:33 -07:00
Ryan Moeller	723726ae7d	FreeBSD: Reduce stack usage of Lua Use the same reduced buffer size for lauxlib that is used on Linux. Fixes panic on HEAD in lua gsub test designed to exhaust stack space. With this we can remove the special case to reserve more stack space on FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kyle Evans <kevans@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10959	2020-10-01 12:16:28 -07:00
Mark Johnston	aba5b019cb	Annontate FreeBSD sysctls with CTLFLAG_MPSAFE Without this, the sysctl system calls will acquire a global lock before invoking the handler. This is noticeable in some situations when running top(1). The global lock is mostly vestigal but continues to see some use and so contention is still a problem; until the default sense of the MPSAFE flag changes, we have to annotate each and every handler. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #10836	2020-10-01 12:16:21 -07:00
Mark Johnston	f664153078	Fix switch statement indentation in the FreeBSD kstat code This is in preparation for some functional changes. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #10950	2020-10-01 12:16:00 -07:00
George Wilson	5899ea5a77	vdev_ashift should only be set once == Motivation and Context The new vdev ashift optimization prevents the removal of devices when a zfs configuration is comprised of disks which have different logical and physical block sizes. This is caused because we set 'spa_min_ashift' in vdev_open and then later call 'vdev_ashift_optimize'. This would result in an inconsistency between spa's ashift calculations and that of the top-level vdev. In addition, the optimization logical ignores the overridden ashift value that would be provided by '-o ashift=<val>'. == Description This change reworks the vdev ashift optimization so that it's only set the first time the device is configured. It still allows the physical and logical ahsift values to be set every time the device is opened but those values are only consulted on first open. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Cedric Berger <cedric@precidata.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-Issue: DLPX-71831 Closes #10932	2020-09-18 12:40:20 -07:00
George Wilson	dacb4f6a61	pool may become suspended during device expansion When expanding a device zfs needs to rescan the partition table to get the correct size. This can only happen when we're in the kernel and requires the device to be closed. As part of the rescan, udev is notified and the device links are removed and recreated. This leave a window where the vdev code may try to reopen the device before udev has recreated the link. If that happens, then the pool may end up in a suspended state. To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which allows the partition information to be modified even while it's in use. This ioctl also does not remove the device link associated with the zfs data partition so it eliminates the race condition that can occur in the kernel. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Wilson <gwilson@delphix.com> Closes #10897	2020-09-18 12:38:30 -07:00
Ryan Moeller	7b86ad215e	FreeBSD: Do not copy vp into f_data for DTYPE_VNODE files https://reviews.freebsd.org/D26346 Do not copy vp into f_data for DTYPE_VNODE files. The vnode pointer is already stored in f_vnode. Use that so f_data can be reused. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10929	2020-09-18 12:38:14 -07:00
John Poduska	aa7817c151	Need a long hold in zpl_mount_impl In zpl_mount_impl, there is: dmu_objset_hold ; returns with pool & ds held dsl_pool_rele sget dsl_dataset_rele As spelled out in the "DSL Pool Configuration Lock" in dsl_pool.c, this requires a long hold. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: John Poduska <jpoduska@datto.com> Closes #10936	2020-09-18 12:38:09 -07:00
Ryan Moeller	2cec08a1f0	Rename acltype=posixacl to acltype=posix Prefer acltype=off\|posix, retaining the old names as aliases. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10918	2020-09-18 12:38:00 -07:00
Pavel Snajdr	1ce90aa441	Fix stack frame size: dnode_dirty_l1range() Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #10879	2020-09-18 12:37:44 -07:00
Pavel Snajdr	c9eab8257d	dmu_redact_snap: fix possible memleak Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #10879	2020-09-18 12:37:39 -07:00
Pavel Snajdr	df39626fdd	Fix stack frame size: dmu_redact_snap() Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #10879	2020-09-18 12:37:34 -07:00
Pavel Snajdr	083ddb7714	Fix stack frame size: spa_livelist_delete_cb() Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #10879	2020-09-18 12:37:29 -07:00
Toomas Soome	84d9492e52	zfs label bootenv should store data as nvlist nvlist does allow us to support different data types and systems. To encapsulate user data to/from nvlist, the libzfsbootenv library is provided. Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #10774	2020-09-15 18:36:12 -07:00
Ryan Moeller	c8bbb0c93d	Linux: Prevent destruction while showing mount devname Use ZFS_ENTER and ZFS_EXIT to protect datasets while their mount devname is being retrieved. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10892 Closes #10927	2020-09-15 18:36:03 -07:00
Mateusz Guzik	29bc31f62f	FreeBSD: convert teardown inactive lock to a read-mostly sleepable lock The lock is taken all the time and as a regular read-write lock avoidably serves as a mount point-wide contention point. This forward ports FreeBSD revision r357322. To quote aforementioned commit: Sample result doing an incremental -j 40 build: before: 173.30s user 458.97s system 2595% cpu 24.358 total after: 168.58s user 254.92s system 2211% cpu 19.147 total Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Closes #10896	2020-09-09 10:26:05 -07:00
Olaf Faaland	55de40fe47	Initialize mmp_last_write when the mmp thread starts A great deal of time may go by between when mmp_init() is called and the MMP thread starts, particularly if there are bad devices, because there is I/O checking configs etc. If this time is too long, (gethrtime() - mmp_last_write) > mmp_fail_ns at the time the MMP thread starts. If MMP is configured to suspend the pool, the pool will be suspended immediately. This can be seen in issue #10838 The value of mmp_last_write doesn't matter before the mmp thread starts. To give the MMP thread time to issue and land MMP writes, initialize mmp_last_write when the MMP thread starts. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #10873	2020-09-09 10:26:04 -07:00
Ryan Moeller	9cea5f0d69	FreeBSD: drop dependency on cryptodev module We only need the kernel interfaces in crypto, not the device node in cryptodev. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10901	2020-09-09 10:26:04 -07:00
George Amanakis	78d84f56d1	Introduce ZFS module parameter l2arc_mfuonly In certain workloads it may be beneficial to reduce wear of L2ARC devices by not caching MRU metadata and data into L2ARC. This commit introduces a new tunable l2arc_mfuonly for this purpose. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10710	2020-09-09 10:26:03 -07:00
Toomas Soome	b155a243a6	dnode_special_open() error: unchecked function return 'zrl_tryenter' Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #10876	2020-09-09 10:26:03 -07:00
Matthew Macy	ee73a8ff3d	FreeBSD: reduce priority of ZIO_TASKQ_ISSUE writes by a larger value On FreeBSD, if priorities divided by four (RQ_PPQ) are equal then a difference between them is insignificant. In other words, incrementing pri by only one as on Linux is insufficient. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10872	2020-09-09 10:26:02 -07:00
Brian Behlendorf	18524b936d	Sequential scrub and resilver updated comments Commit `d4a72f2` which introduced multi-phase scrubs and resilvers continued the work presented by Nexenta at the 2016 ZFS developer summit. Update the source to reflect their contribution. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2020-09-09 10:26:00 -07:00
Don Brady	8afac5dc55	Avoid posting duplicate zpool events Duplicate io and checksum ereport events can misrepresent that things are worse than they seem. Ideally the zpool events and the corresponding vdev stat error counts in a zpool status should be for unique errors -- not the same error being counted over and over. This can be demonstrated in a simple example. With a single bad block in a datafile and just 5 reads of the file we end up with a degraded vdev, even though there is only one unique error in the pool. The proposed solution to the above issue, is to eliminate duplicates when posting events and when updating vdev error stats. We now save recent error events of interest when posting events so that we can easily check for duplicates when posting an error. Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #10861	2020-09-09 10:26:00 -07:00
Matthew Ahrens	bd724261d2	nowait synctask must succeed If a `zfs_space_check_t` other than `ZFS_SPACE_CHECK_NONE` is used with `dsl_sync_task_nowait()`, the sync task may fail due to ENOSPC. However, there is no way to notice or communicate this failure, so it's extremely difficult to use this functionality correctly, and in fact almost all callers use `ZFS_SPACE_CHECK_NONE`. This commit removes the `zfs_space_check_t` argument from `dsl_sync_task_nowait()`, and always uses `ZFS_SPACE_CHECK_NONE`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10855	2020-09-09 10:25:59 -07:00
Ryan Moeller	a1e03186fd	Retain thread name when resuming a zthr When created, a zthr is given a name to identify it by. This name is lost when a cancelled zthr is resumed. Retain the name of a zthr so it can be used when resuming. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10881	2020-09-09 10:21:16 -07:00
Matthew Macy	36f36610c3	Replace cv_{timed}wait_sig with cv_{timed}wait_idle where appropriate There are a number of places where cv_?_sig is used simply for accounting purposes but the surrounding code has no ability to cope with actually receiving a signal. On FreeBSD it is possible to send signals to individual kernel threads so this could enable undesirable behavior. This patch adds routines on Linux that will do the same idle accounting as _sig without making the task interruptible. On FreeBSD cv__idle are all aliases for cv_ Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10843	2020-09-09 10:21:01 -07:00
Spencer Kinny	fd20a81b9a	Links in Source Files Added comments in following files with links to Illumos manual pages: ./module/avl/avl.c ./module/nvpair/nvpair.c ./module/os/linux/spl/spl-kstat.c ./module/os/freebsd/spl/spl_kstat.c Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com> Closes #5113 Closes #10859	2020-09-03 16:17:18 -07:00
Toomas Soome	ef8a6fe9fe	zvol: unsigned off can not be less than zero Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #10867	2020-09-03 16:16:52 -07:00
Ryan Moeller	da81d91d48	Make spa_stats.c tunables visible on FreeBSD Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and add update tunables.cfg in tests for the newly supported ones. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10858	2020-09-03 16:16:34 -07:00
Matthew Macy	ecd3976f5b	FreeBSD: Fix up after spa_stats.c move Moving spa_stats added the additional burden of supporting KSTAT_TYPE_IO. spa_state_addr will always return a valid value regardless of the value of 'n'. On FreeBSD this will cause an infinite loop as it relies on the raw ops addr routine to indicate that there is no more data. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10860	2020-09-03 16:16:22 -07:00
Ryan Moeller	76a157f004	Add 'zfs rename -u' to rename without remounting Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' property set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back. This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children). In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible. Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported by: Matt Macy <mmacy@freebsd.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10839	2020-09-03 16:16:15 -07:00
Ryan Moeller	6512c18fe1	FreeBSD: Remove unused SECLABEL code SECLABEL is undefined on FreeBSD and should be pruned. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #10847	2020-09-03 16:16:10 -07:00
Ryan Moeller	d6a779a278	FreeBSD: Simplify INGLOBALZONE FreeBSD's previous ZFS implemented INGLOBALZONE(thread) as (!jailed((thread)->td_ucred)) and passed curthread to INGLOBALZONE. We pass curproc instead of curthread, so we can achieve the same effect with (!jailed((proc)->p_ucred)). The implementation is trivial enough to fit on a single line in a define. We don't really need a whole separate function for something that's already macros all the way down. Eliminate in_globalzone. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org> Closes #10851	2020-09-03 16:15:59 -07:00
Toomas Soome	8a06356e24	zio_ereport_post() and zio_ereport_start() return values are ignored use (void) to silence analyzers. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #10857	2020-09-03 16:15:47 -07:00
Matthew Macy	baed4fbacb	Move spa_stats.c to common code Initially it was considered simplest to stub out all of the functions on FreeBSD. Now that FreeBSD supports KSTAT_TYPE_RAW at least some of the functionality should be made available. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10842	2020-08-30 14:19:08 -07:00
Matthew Macy	f4c8e9c69b	FreeBSD: Fix spurious failure in zvol_geom_open In zvol_geom_open on first open we need to guarantee that the namespace lock is held to avoid spurious failures in zvol_first_open. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10841	2020-08-30 14:19:03 -07:00
Matthew Macy	8639ca86da	FreeBSD: add support for KSTAT_TYPE_RAW A few kstats use KSTAT_TYPE_RAW to provide a string generated on demand. Implementing these as sysctls was punted until now. Reviewed by: Toomas Soome <tsoome@me.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10836	2020-08-30 14:18:54 -07:00
Brian Behlendorf	c6ee83893e	Linux 5.9 compat: NR_SLAB_RECLAIMABLE Commit `dcdc12e` added compatibility code to treat NR_SLAB_RECLAIMABLE_B as if it were the same as NR_SLAB_RECLAIMABLE. However, the new value is in bytes while the old value was in pages which means they are not interchangeable. The only place the reclaimable slab size is used is as a component of the calculation done by arc_free_memory(). This function returns the amount of memory the ARC considers to be free or reclaimable at little cost. Rather than switch to a new interface to get this value it has been removed it from the calculation. It is normally a minor component compared to the number of inactive or free pages, and removing it aligns the behavior with the FreeBSD version of arc_free_memory(). Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Coleman Kane <ckane@colemankane.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10834	2020-08-30 14:18:50 -07:00
Georgy Yakovlev	c2068750d7	module/zstd: pass -U__BMI__ If kernel is compiled with -march=znver1 or -march=znver2 zstd module compilation will fail due to SSE register return with SSE disabled. What's interesting, is that -march=skylake also implies -mbmi which defines __BMI__ but compilation succeeds. It is probably due to different BMI implementations on AMD and INTEL processors and the way compiler uses instructions. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org> Closes #10758 Closes #10829	2020-08-27 16:07:13 -07:00
Patrick Mooney	1ac6248312	dnode_sync is careless with range tree Because dnode_sync_free_range() must drop dn_mtx during its processing, using it as a callback to range_tree_vacate() is not safe. No other operations (besides destroy) are allowed once range_tree_vacate() has begun, and dropping dn_mtx would leave a window open for another thread to observe that invalid (and unsafe) state via dnode_block_freed(). Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Patrick Mooney <pmooney@oxide.computer> Closes #10708 Closes #10823	2020-08-27 16:07:05 -07:00
Ryan Moeller	57fc3987a0	zpool: Change base URL for ZFS messages to openzfs-docs Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10820	2020-08-27 16:06:57 -07:00
Brian Behlendorf	4f6167deb5	Remove duplicate dnode.h include The zfs/sa.c source file accidentally includes sys/dnode.h twice. Remove the second occurrence. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10816 Closes #10819	2020-08-27 16:06:52 -07:00
Paul Dagnelie	79d6a1b1da	Always track temporary fses and snapshots for accounting The root cause of the issue is that we only occasionally do as the comments in the code suggest and actually ignore the %recv dataset when it comes to filesystem limit tracking. Specifically, the only time we ignore it is when initializing the filesystem and snapshot limit values; when creating a new %recv dataset or deleting one, we always update the bookkeeping. This causes a problem if you init the fs count on a filesystem that already has a %recv dataset, since the bookmarking will be decremented but not incremented. This is resolved in this patch by simply always tracking the %recv dataset as a child. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10791	2020-08-27 16:06:47 -07:00
Toomas Soome	510179f086	Remove pragma ident lines The #pragma ident is a historical relic and not needed any more, this pragma is actually unknown for common compilers and is only causing trouble. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Toomas Soome <tsoome@me.com> Closes #10810	2020-08-27 16:06:39 -07:00
Matthew Macy	cb16a5e043	FreeBSD: disable neon usage The neon support code does not build on FreeBSD, ifdef out references to fix linker issues on arm64. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10809	2020-08-27 16:06:35 -07:00
Alexander Motin	3ca31bd0c6	Introduce limit on size of L2ARC headers Since L2ARC buffers are not evicted on memory pressure, too large amount of headers on system with irrationally large L2ARC can render it slow or even unusable. This change limits L2ARC writes and rebuild if unevictable L2ARC-only headers reach dangerous level. While there, call arc_adapt() on L2ARC rebuild, so that it could properly grow arc_c, reflecting potentially significant ARC size increase and avoiding slow growth with hopeless eviction attempts later when "overflow" is detected. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reported-by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #10765	2020-08-27 16:06:28 -07:00
sterlingjensen	87688b686b	Mark lua setjmp/longjmp for powerpc weak Linux already defines setjmp/longjmp for powerpc, which leads to duplicate symbols in a statically linked build. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sterlng Jensen <sterlingjensen@users.noreply.github.com> Closes #10795	2020-08-25 10:32:49 -07:00
Brian Behlendorf	94dac3e880	Export dmu_offset_next() symbol Export the dmu_offset_next() symbol for use by Lustre. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10796	2020-08-25 08:34:41 -07:00
youzhongyang	b900799768	Fix inability to destroy snapshot used over NFS The cache of struct svc_export and struct svc_expkey by nfsd and rpc.mountd for the snapshot holds references to the mount point. We need to flush them out before unmounting, otherwise umount would fail with EBUSY. Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes #6000 Closes #10783	2020-08-24 17:33:02 -07:00
Sebastian Gottschall	184df27eef	Avoid symbol collision with in-kernel zstdlib For Linux, when zfs is compiled as an in kernel static variant and the in kernel zstd library is compiled statically into the kernel a symbol collision will occur. This wrapper header renames all of the relevant zstd functions to avoid this problem. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Closes #10775	2020-08-24 12:20:41 -07:00
Andrew	a741b386d3	Prevent zfs_acl_chmod() if aclmode restricted and ACL inherited In absence of inheriting entry for owner@, group@, or everyone@, zfs_acl_chmod() is called to set these. This can cause confusion for Samba admins who do not expect these entries to appear on newly created files and directories once they have been stripped from from the parent directory. When aclmode is set to "restricted", chmod is prevented on non-trivial ACLs. It is not a stretch to assume that in this case the administrator does not want ZFS to add the missing special entries. Add check for this aclmode, and if an inherited entry is present skip zfs_acl_chmod(). Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrew Walker <awalker@ixsystems.com> Closes #10748	2020-08-22 21:49:25 -07:00
Clint Armstrong	1ddd7cdb92	Make formatting of dedup values string consistent All other prop values return options separated by ` \| `, dedup values do not, they are separated by `, `. This change makes the dedup value formatting consistent with other properties. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Clint Armstrong <clint@clintarmstrong.net> Closes #10761	2020-08-22 10:58:07 -07:00
Ryan Moeller	6fe3498ca3	Import vdev ashift optimization from FreeBSD Many modern devices use physical allocation units that are much larger than the minimum logical allocation size accessible by external commands. Two prevalent examples of this are 512e disk drives (512b logical sector, 4K physical sector) and flash devices (512b logical sector, 4K or larger allocation block size, and 128k or larger erase block size). Operations that modify less than the physical sector size result in a costly read-modify-write or garbage collection sequence on these devices. Simply exporting the true physical sector of the device to ZFS would yield optimal performance, but has two serious drawbacks: 1. Existing pools created with devices that have different logical and physical block sizes, but were configured to use the logical block size (e.g. because the OS version used for pool construction reported the logical block size instead of the physical block size) will suddenly find that the vdev allocation size has increased. This can be easily tolerated for active members of the array, but ZFS would prevent replacement of a vdev with another identical device because it now appears that the smaller allocation size required by the pool is not supported by the new device. 2. The device's physical block size may be too large to be supported by ZFS. The optimal allocation size for the vdev may be quite large. For example, a RAID controller may export a vdev that requires read-modify-write cycles unless accessed using 64k aligned/sized requests. ZFS currently has an 8k minimum block size limit. Reporting both the logical and physical allocation sizes for vdevs solves these problems. A device may be used so long as the logical block size is compatible with the configuration. By comparing the logical and physical block sizes, new configurations can be optimized and administrators can be notified of any existing pools that are sub-optimal. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Matthew Macy <mmacy@freebsd.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10619	2020-08-21 12:53:17 -07:00
Matthew Ahrens	3dc18995bd	Fix indentation in dnode_free_range() Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10744	2020-08-20 11:45:20 -07:00
Matthew Macy	1c2725a157	FreeBSD: 11.x arc_stats compatibility Removing other_size from arc_stats breaks top in 11.x jails running on HEAD. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10745	2020-08-20 10:55:02 -07:00
Michael Niewöhner	10b3c7f5e4	Add zstd support to zfs This PR adds two new compression types, based on ZStandard: - zstd: A basic ZStandard compression algorithm Available compression. Levels for zstd are zstd-1 through zstd-19, where the compression increases with every level, but speed decreases. - zstd-fast: A faster version of the ZStandard compression algorithm zstd-fast is basically a "negative" level of zstd. The compression decreases with every level, but speed increases. Available compression levels for zstd-fast: - zstd-fast-1 through zstd-fast-10 - zstd-fast-20 through zstd-fast-100 (in increments of 10) - zstd-fast-500 and zstd-fast-1000 For more information check the man page. Implementation details: Rather than treat each level of zstd as a different algorithm (as was done historically with gzip), the block pointer `enum zio_compress` value is simply zstd for all levels, including zstd-fast, since they all use the same decompression function. The compress= property (a 64bit unsigned integer) uses the lower 7 bits to store the compression algorithm (matching the number of bits used in a block pointer, as the 8th bit was borrowed for embedded block pointers). The upper bits are used to store the compression level. It is necessary to be able to determine what compression level was used when later reading a block back, so the concept used in LZ4, where the first 32bits of the on-disk value are the size of the compressed data (since the allocation is rounded up to the nearest ashift), was extended, and we store the version of ZSTD and the level as well as the compressed size. This value is returned when decompressing a block, so that if the block needs to be recompressed (L2ARC, nop-write, etc), that the same parameters will be used to result in the matching checksum. All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`, `zio_prop_t`, etc.) uses the separated _compress and _complevel variables. Only the properties ZAP contains the combined/bit-shifted value. The combined value is split when the compression_changed_cb() callback is called, and sets both objset members (os_compress and os_complevel). The userspace tools all use the combined/bit-shifted value. Additional notes: zdb can now also decode the ZSTD compression header (flag -Z) and inspect the size, version and compression level saved in that header. For each record, if it is ZSTD compressed, the parameters of the decoded compression header get printed. ZSTD is included with all current tests and new tests are added as-needed. Per-dataset feature flags now get activated when the property is set. If a compression algorithm requires a feature flag, zfs activates the feature when the property is set, rather than waiting for the first block to be born. This is currently only used by zstd but can be extended as needed. Portions-Sponsored-By: The FreeBSD Foundation Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #6247 Closes #9024 Closes #10277 Closes #10278	2020-08-20 10:30:06 -07:00
Michael Niewöhner	dc544aba15	Import ZStandard v1.4.5 ZStandard is a modern, high performance, general compression algorithm. It provides similar or better compression levels to GZIP, but with much better performance. ZStandard provides a large selection of compression levels to allow a storage administrator to select the preferred performance/compression trade-off. This commit imports the unmodified ZStandard single-file library which will be used by ZFS. The implementation of this new library is done with future updates of zstd in mind. For this reason we integrated the code in a way, that does not require modifications to the library. For more details, see `module/zstd/README.md`. The library is excluded from codecov calculation and cppcheck as unaltered dependencies do not need full codecov or cppcheck. Co-authored-by: Allan Jude <allanjude@freebsd.org> Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Allan Jude <allanjude@freebsd.org> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>	2020-08-20 10:30:06 -07:00
Mariusz Zaborski	f2c027bd6a	FreeBSD: Add option to rewind checkpoint while importing root pool This option is used by FreeBSD boot loader. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org> Closes #10738	2020-08-19 17:19:42 -07:00

1 2 3 4 5 ...

3313 Commits