Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
tony-zfs	02fced3067	Add support to decode a resume token Adding a new subcommand to zstream called token. This now allows users to decode a resume token to retrieve the toname field. This can be useful for tools that need this information. The syntax works as follows zstream token <resume_token>. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Tony Perkins <tperkins@datto.com> Closes #10558	2020-07-23 17:44:03 -07:00
Matthew Ahrens	6774931dfa	Extend zdb to print inconsistencies in livelists and metaslabs Livelists and spacemaps are data structures that are logs of allocations and frees. Livelists entries are block pointers (blkptr_t). Spacemaps entries are ranges of numbers, most often used as to track allocated/freed regions of metaslabs/vdevs. These data structures can become self-inconsistent, for example if a block or range can be "double allocated" (two allocation records without an intervening free) or "double freed" (two free records without an intervening allocation). ZDB (as well as zfs running in the kernel) can detect these inconsistencies when loading livelists and metaslab. However, it generally halts processing when the error is detected. When analyzing an on-disk problem, we often want to know the entire set of inconsistencies, which is not possible with the current behavior. This commit adds a new flag, `zdb -y`, which analyzes the livelist and metaslab data structures and displays all of their inconsistencies. Note that this is different from the leak detection performed by `zdb -b`, which checks for inconsistencies between the spacemaps and the tree of block pointers, but assumes the spacemaps are self-consistent. The specific checks added are: Verify livelists by iterating through each sublivelists and: - report leftover FREEs - report double ALLOCs and double FREEs - record leftover ALLOCs together with their TXG [see Cross Check] Verify spacemaps by iterating over each metaslab and: - iterate over spacemap and then the metaslab's entries in the spacemap log, then report any double FREEs and double ALLOCs Verify that livelists are consistenet with spacemaps. The space referenced by livelists (after using the FREE's to cancel out corresponding ALLOCs) should be allocated, according to the spacemaps. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> External-issue: DLPX-66031 Closes #10515	2020-07-14 17:51:05 -07:00
Arvind Sankar	38e2e9ce83	Centralize variable substitution A bunch of places need to edit files to incorporate the configured paths i.e. bindir, sbindir etc. Move this logic into a common file. Create arc_summary by copying arc_summary[23] as appropriate at build time instead of install time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10559	2020-07-14 17:33:44 -07:00
Brian Behlendorf	9a49d3f3d3	Add device rebuild feature The device_rebuild feature enables sequential reconstruction when resilvering. Mirror vdevs can be rebuilt in LBA order which may more quickly restore redundancy depending on the pools average block size, overall fragmentation and the performance characteristics of the devices. However, block checksums cannot be verified as part of the rebuild thus a scrub is automatically started after the sequential resilver completes. The new '-s' option has been added to the `zpool attach` and `zpool replace` command to request sequential reconstruction instead of healing reconstruction when resilvering. zpool attach -s <pool> <existing vdev> <new vdev> zpool replace -s <pool> <old vdev> <new vdev> The `zpool status` output has been updated to report the progress of sequential resilvering in the same way as healing resilvering. The one notable difference is that multiple sequential resilvers may be in progress as long as they're operating on different top-level vdevs. The `zpool wait -t resilver` command was extended to wait on sequential resilvers. From this perspective they are no different than healing resilvers. Sequential resilvers cannot be supported for RAIDZ, but are compatible with the dRAID feature being developed. As part of this change the resilver_restart_* tests were moved in to the functional/replacement directory. Additionally, the replacement tests were renamed and extended to verify both resilvering and rebuilding. Original-patch-by: Isaac Huang <he.huang@intel.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: John Poduska <jpoduska@datto.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10349	2020-07-03 11:05:50 -07:00
Arvind Sankar	6b99fc0620	Fixes for make dist Reduce the usage of EXTRA_DIST. If files are conditionally included in _SOURCES, _HEADERS etc, automake is smart enough to dist all files that could possibly be included, but this does not apply to EXTRA_DIST, resulting in make dist depending on the configuration. Add some files that were missing altogether in various Makefile's. The changes to disted files in this commit (excluding deleted files): +./cmd/zed/agents/README.md +./etc/init.d/README.md +./lib/libspl/os/freebsd/getexecname.c +./lib/libspl/os/freebsd/gethostid.c +./lib/libspl/os/freebsd/getmntany.c +./lib/libspl/os/freebsd/mnttab.c -./lib/libzfs/libzfs_core.pc -./lib/libzfs/libzfs.pc +./lib/libzfs/os/freebsd/libzfs_compat.c +./lib/libzfs/os/freebsd/libzfs_fsshare.c +./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c +./lib/libzfs/os/freebsd/libzfs_zmount.c +./lib/libzutil/os/freebsd/zutil_compat.c +./lib/libzutil/os/freebsd/zutil_device_path_os.c +./lib/libzutil/os/freebsd/zutil_import_os.c +./module/lua/README.zfs +./module/os/linux/spl/README.md +./tests/README.md +./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh +./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh +./tests/zfs-tests/tests/functional/inheritance/README.config +./tests/zfs-tests/tests/functional/inheritance/README.state +./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh +./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10501	2020-06-26 14:20:02 -07:00
Jorgen Lundman	68301ba20e	zed additional features This commit adds two features to zed, that macOS desires. The first is that when you unload the kernel module, zed would enter into a cpubusy loop calling zfs_events_next() repeatedly. We now look for ENODEV, returned by kernel, so zed can exit gracefully. Second feature is -I (idle) (alas -P persist was taken) is for the deamon to; 1; if started without ZFS kernel module, stick around waiting for it. 2; if kernel module is unloaded, go back to 1. This is due to daemons in macOS is started by launchctl, and is expected to stick around. Currently, the busy loop only exists when errno is ENODEV. This is to ensure that functionality that upstream expects is not changed. It did not care about errors before, and it still does not. (with the exception of ENODEV). However, it is probably better that all errors (ERESTART notwithstanding) exits the loop, and the issues complaining about zed taking all CPU will go away. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #10476	2020-06-22 09:53:34 -07:00
Grischa Zengel	059f7c20e3	man.8: Add bookmark to list of types While checking bash_completion I missed bookmark as type. ``` # zfs get type zpool2#b NAME PROPERTY VALUE SOURCE zpool2#b type bookmark - ``` Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Grischa Zengel <github.zfsonlinux@zengel.info> Closes #10419	2020-06-10 17:53:07 -07:00
Matthew Ahrens	f66434268c	Remove unnecessary references to slavery The horrible effects of human slavery continue to impact society. The casual use of the term "slave" in computer software is an unnecessary reference to a painful human experience. This commit removes all possible references to the term "slave". Implementation notes: The zpool.d/slaves script is renamed to dm-deps, which uses the same terminology as `dmsetup deps`. References to the `/sys/class/block/$dev/slaves` directory remain. This directory name is determined by the Linux kernel. Although `dmsetup deps` provides the same information, it unfortunately requires elevated privileges, whereas the `/sys/...` directory is world-readable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10435	2020-06-10 17:07:59 -07:00
Andrea Gelmini	dd4bc569b9	Fix typos Correct various typos in the comments and tests. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #10423	2020-06-09 21:24:09 -07:00
George Amanakis	b7654bd794	Trim L2ARC The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #9713 Closes #9789 Closes #10224	2020-06-09 10:15:08 -07:00
felixdoerre	501a1511ae	mount: use the mount syscall directly Allow zfs datasets to be mounted on Linux without relying on the invocation of an external processes. This is the same behavior which is implemented for FreeBSD. Use of the libmount library was originally considered because it provides functionality to properly lock and update the /etc/mtab file. However, these days /etc/mtab is typically a symlink to /proc/self/mounts so there's nothing to updated. Therefore, we call mount(2) directly and avoid any additional dependencies. If required the legacy behavior can be enabled by setting the ZFS_MOUNT_HELPER environment variable. This may be needed in environments where SELinux in enabled and the zfs binary does not have mount permission. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Felix Dörre <felix@dogcraft.de> #10294	2020-05-20 18:02:41 -07:00
Paul Dagnelie	de4f06c275	Small program that converts a dataset id and an object id to a path Small program that converts a dataset id and an object id to a path Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10204	2020-05-20 10:05:33 -07:00
George Amanakis	657fd33bcf	Improvements on persistent L2ARC Functional changes: We implement refcounts of log blocks and their aligned size on the cache device along with two corresponding arcstats. The refcounts are reflected in the header of the device and provide valuable information as to whether log blocks are accounted for correctly. These are dynamically adjusted as log blocks are committed/evicted. zdb also uses this information in the device header and compares it to the corresponding values as reported by dump_l2arc_log_blocks() which emulates l2arc_rebuild(). If the refcounts saved in the device header report higher values, zdb exits with an error. For this feature to work correctly there should be no active writes on the device. This is also employed in the tests of persistent L2ARC. We extend the structure of the cache device header by adding the two new variables mirroring the refcounts after the existing variables to preserve backward compatibility in terms of persistent L2ARC. 1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which reflect the total aligned size of log blocks on the device. This is also reflected in the header of the cache device as "dh_lb_asize". 2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count" which reflect the total number of L2ARC log blocks present on cache devices. It is also reflected in the header of the cache device as "dh_lb_count". In l2arc_rebuild_vdev() if the amount of committed log entries in a log block is 0 and the device header is valid we update the device header. This will facilitate trimming of the whole device in this case when TRIM for L2ARC is implemented. Improve loop protection in l2arc_rebuild() by using the starting offset of the payload of each log block instead of the starting offset of the log block. If the zio in l2arc_write_buffers() fails, restore the lbps array in the header of the device to its previous state in l2arc_write_done(). If l2arc_rebuild() ends the rebuild process without restoring any L2ARC log blocks in ARC and without any other error, this means that the lbps array in the header is pointing to non-existent or invalid log blocks. Reset the device header in this case. In l2arc_rebuild() change the zfs_dbgmsg messages to spa_history_log_internal() making them user visible with zpool history command. Non-functional changes: Make the first test in persistent L2ARC use `zdb -lll` to increase coverage in `zdb.c`. Rename psize with asize when referring to log blocks, since L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also rename dh_log_blk_entries to dh_log_entries to make it clear that it is a mirror of l2ad_log_entries. Added comments for both changes. Fix inaccurate comments for example in l2arc_log_blk_restore(). Add asserts at the end in l2arc_evict() and l2arc_write_buffers(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #10228	2020-05-07 16:34:03 -07:00
Paul B. Henson	7bf3e1fa0f	OpenZFS 3254 - add support in zfs for aclmode=restricted Authored-by: Paul B. Henson <henson@acm.org> Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Paul B. Henson <henson@acm.org> OpenZFS-issue: https://www.illumos.org/issues/3254 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/71dbfc287c Closes #10266	2020-04-30 11:23:59 -07:00
Paul B. Henson	a1af567bb6	OpenZFS 742 - Resurrect the ZFS "aclmode" property OpenZFS 664 - Umask masking "deny" ACL entries OpenZFS 279 - Bug in the new ACL (post-PSARC/2010/029) semantics Porting notes: * Updated zfs_acl_chmod to take 'boolean_t isdir' as first parameter rather than 'zfsvfs_t zfsvfs' zfs man pages changes mixed between zfs and new zfsprops man pages Reviewed by: Aram Hvrneanu <aram@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Robert Gordon <rbg@openrbg.com> Reviewed by: Mark.Maybee@oracle.com Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@nexenta.com> Ported-by: Paul B. Henson <henson@acm.org> OpenZFS-issue: https://www.illumos.org/issues/742 OpenZFS-issue: https://www.illumos.org/issues/664 OpenZFS-issue: https://www.illumos.org/issues/279 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3c49ce110 Closes #10266	2020-04-30 11:22:45 -07:00
alex	47c9299fcc	zfs_create: round up volume size to multiple of bs Round up the volume size requested in `zfs create -V size` to the next higher multiple of the volblocksize. Updates the man page and adds a test to verify the new behavior. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reported-by: puffi <puffi@users.noreply.github.com> Signed-off-by: Alex John <alex@stty.io> Closes #8541 Closes #10196	2020-04-24 19:04:34 -07:00
Matthew Ahrens	196bee4cfd	Remove deduplicated send/receive code Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such streams) are deprecated. Deduplicated send streams can be received by first converting them to non-deduplicated with the `zstream redup` command. This commit removes the code for sending and receiving deduplicated send streams. `zfs send -D` will now print a warning, ignore the `-D` flag, and generate a regular (non-deduplicated) send stream. `zfs receive` of a deduplicated send stream will print an error message and fail. The resulting code simplification (especially in the kernel's support for receiving dedup streams) should help enable future performance enhancements. Several new tests are added which leverage `zstream redup`. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Issue #7887 Issue #10117 Issue #10156 Closes #10212	2020-04-23 10:06:57 -07:00
Matthew Ahrens	c618f87cd2	Add `zstream redup` command to convert deduplicated send streams Deduplicated send and receive is deprecated. To ease migration to the new dedup-send-less world, the commit adds a `zstream redup` utility to convert deduplicated send streams to normal streams, so that they can continue to be received indefinitely. The new `zstream` command also replaces the functionality of `zstreamdump`, by way of the `zstream dump` subcommand. The `zstreamdump` command is replaced by a shell script which invokes `zstream dump`. The way that `zstream redup` works under the hood is that as we read the send stream, we build up a hash table which maps from `<GUID, object, offset> -> <file_offset>`. Whenever we see a WRITE record, we add a new entry to the hash table, which indicates where in the stream file to find the WRITE record for this block. (The key is `drr_toguid, drr_object, drr_offset`.) For entries other than WRITE_BYREF, we pass them through unchanged (except for the running checksum, which is recalculated). For WRITE_BYREF records, we change them to WRITE records. We find the referenced WRITE record by looking in the hash table (for the record with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading the record header and payload from the specified offset in the stream file. This is why the stream can not be a pipe. The found WRITE record replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`, and `drr_offset` fields changed to be the same as the WRITE_BYREF's (i.e. we are writing the same logical block, but with the data supplied by the previous WRITE record). This algorithm requires memory proportional to the number of WRITE records (same as `zfs send -D`), but the size per WRITE record is relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to "redup". Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10124 Closes #10156	2020-04-10 10:39:55 -07:00
George Amanakis	77f6826b83	Persistent L2ARC This commit makes the L2ARC persistent across reboots. We implement a light-weight persistent L2ARC metadata structure that allows L2ARC contents to be recovered after a reboot. This significantly eases the impact a reboot has on read performance on systems with large caches. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Saso Kiselkov <skiselkov@gmail.com> Co-authored-by: Jorgen Lundman <lundman@lundman.net> Co-authored-by: George Amanakis <gamanakis@gmail.com> Ported-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #925 Closes #1823 Closes #2672 Closes #3744 Closes #9582	2020-04-10 10:33:35 -07:00
Paul Dagnelie	5a42ef04fd	Add 'zfs wait' command Add a mechanism to wait for delete queue to drain. When doing redacted send/recv, many workflows involve deleting files that contain sensitive data. Because of the way zfs handles file deletions, snapshots taken quickly after a rm operation can sometimes still contain the file in question, especially if the file is very large. This can result in issues for redacted send/recv users who expect the deleted files to be redacted in the send streams, and not appear in their clones. This change duplicates much of the zpool wait related logic into a zfs wait command, which can be used to wait until the internal deleteq has been drained. Additional wait activities may be added in the future. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #9707	2020-04-01 10:02:06 -07:00
Matthew Ahrens	652bdc9b0e	Deprecate deduplicated send streams Dedup send can only deduplicate over the set of blocks in the send command being invoked, and it does not take advantage of the dedup table to do so. This is a very common misconception among not only users, but developers, and makes the feature seem more useful than it is. As a result, many users are using the feature but not getting any benefit from it. Dedup send requires a nontrivial expenditure of memory and CPU to operate, especially if the dataset(s) being sent is (are) not already using a dedup-strength checksum. Dedup send adds developer burden. It expands the test matrix when developing new features, causing bugs in released code, and delaying development efforts by forcing more testing to be done. As a result, we are deprecating the use of `zfs send -D` and receiving of such streams. This change adds a warning to the man page, and also prints the warning whenever dedup send or receive are used. In a future release, we plan to: 1. remove the kernel code for generating deduplicated streams 2. make `zfs send -D` generate regular, non-deduplicated streams 3. remove the kernel code for receiving deduplicated streams 4. make `zfs receive` of deduplicated streams process them in userland to "re-duplicate" them, so that they can still be received. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #7887 Closes #10117	2020-03-18 13:31:10 -07:00
Mariusz Zaborski	a57d3d45d6	Add option for forcible unmounting dataset while receiving snapshot. Currently when the dataset is in use we can't receive snapshots. zfs send test/1@asd \| zfs recv -FM test/2 cannot unmount '/test/2': Device busy This commits add option 'M' which attempts to forcibly unmount the dataset. Thanks to this we can enforce receiving snapshots in a single step. Note that this functionality is not supported on Linux because the VFS will prevent active mounted filesystems from being unmounted, even with the force option. This is the intended VFS behavior. Test cases were added to verify the expected behavior based on the platform. Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allanjude@freebsd.org> External-issue: https://reviews.freebsd.org/D22306 Closes #9904	2020-03-17 10:08:32 -07:00
Ryan Moeller	f5f6fb03b7	Change default to overlay=on Filesystems allow overlay mounts by default on FreeBSD and Linux. Respect the native convention by switching the default to overlay=on, while retaining the option to turn the property off for compatibility with other operating systems' conventions. Update documentation and tests accordingly. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10030	2020-03-06 09:28:19 -08:00
Brian Behlendorf	2288d41968	Add trim support to zpool wait Manual trims fall into the category of long-running pool activities which people might want to wait synchronously for. This change adds support to 'zpool wait' for waiting for manual trim operations to complete. It also adds a '-w' flag to 'zpool trim' which can be used to turn 'zpool trim' into a synchronous operation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: John Gallagher <john.gallagher@delphix.com> Closes #10071	2020-03-04 15:07:11 -08:00
Mariusz Zaborski	89adffb7bc	Add notice that forcefully unmount is not supported on Linux The Linux VFS will never allow a filesystem which is in use to be unmounted. This behavior differs from other platforms like FreeBSD which allow a filesystem to be force unmounted. This will result in errors being returned to applications actively using the filesystem. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org> Closes #10013	2020-02-18 13:36:23 -08:00
InsanePrawn	ecbbdac799	Systemd mount generator: Generate noauto units; add control properties This commit refactors the systemd mount generators and makes the following major changes: - The generator now generates units for datasets marked canmount=noauto, too. These units are NOT WantedBy local-fs.target. If there are multiple noauto datasets for a path, no noauto unit will be created. Datasets with canmount=on are prioritized. - Introduces handling of new user properties which are now included in the zfs-list.cache files: - org.openzfs.systemd:requires: List of units to require for this mount unit - org.openzfs.systemd:requires-mounts-for: List of mounts to require by this mount unit - org.openzfs.systemd:before: List of units to order after this mount unit - org.openzfs.systemd:after: List of units to order before this mount unit - org.openzfs.systemd:wanted-by: List of units to add a Wants dependency on this mount unit to - org.openzfs.systemd:required-by: List of units to add a Requires dependency on this mount unit to - org.openzfs.systemd:nofail: Toggles between a wants and a requires dependency. - org.openzfs.systemd:ignore: Do not generate a mount unit for this dataset. Consult the updated man page for detailed documentation. - Restructures and extends the zfs-mount-generator(8) man page with the above properties, information on unit ordering and a license header. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: InsanePrawn <insane.prawny@gmail.com> Closes #9649	2020-02-14 15:32:55 -08:00
Jason King	13b5a4d5c0	Support setting user properties in a channel program This adds support for setting user properties in a zfs channel program by adding 'zfs.sync.set_prop' and 'zfs.check.set_prop' to the ZFS LUA API. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Co-authored-by: Sara Hartse <sara.hartse@delphix.com> Contributions-by: Jason King <jason.king@joyent.com> Signed-off-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Jason King <jason.king@joyent.com> Closes #9950	2020-02-14 13:41:42 -08:00
Christian Schwarz	948f0c4419	zcp: add zfs.sync.bookmark Add support for bookmark creation and cloning. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #9571	2020-02-11 13:19:17 -08:00
Christian Schwarz	a73f361fdb	Implement bookmark copying This feature allows copying existing bookmarks using zfs bookmark fs#target fs#newbookmark There are some niche use cases for such functionality, e.g. when using bookmarks as markers for replication progress. Copying redaction bookmarks produces a normal bookmark that cannot be used for redacted send (we are not duplicating the redaction object). ZCP support for bookmarking (both creation and copying) will be implemented in a separate patch based on this work. Overview: - Terminology: - source = existing snapshot or bookmark - new/bmark = new bookmark - Implement bookmark copying in `dsl_bookmark.c` - create new bookmark node - copy source's `zbn_phys` to new's `zbn_phys` - zero-out redaction object id in copy - Extend existing bookmark ioctl nvlist schema to accept bookmarks as sources - => `dsl_bookmark_create_nvl_validate` is authoritative - use `dsl_dataset_is_before` check for both snapshot and bookmark sources - Adjust CLI - refactor shortname expansion logic in `zfs_do_bookmark` - Update man pages - warn about redaction bookmark handling - Add test cases - CLI - pyyzfs libzfs_core bindings Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Christian Schwarz <me@cschwarz.com> Closes #9571	2020-02-11 13:19:12 -08:00
Paul Zuchowski	bc67cba7c0	Fix zdb -R with 'b' flag zdb -R :b fails due to the indirect block being compressed, and the 'b' and 'd' flag not working in tandem when specified. Fix the flag parsing code and create a zfs test for zdb -R block display. Also fix the zio flags where the dotted notation for the vdev portion of DVA (i.e. 0.0:offset:length) fails. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9640 Closes #9729	2020-02-10 14:00:05 -08:00
Attila Fülöp	31b160f0a6	ICP: Improve AES-GCM performance Currently SIMD accelerated AES-GCM performance is limited by two factors: a. The need to disable preemption and interrupts and save the FPU state before using it and to do the reverse when done. Due to the way the code is organized (see (b) below) we have to pay this price twice for each 16 byte GCM block processed. b. Most processing is done in C, operating on single GCM blocks. The use of SIMD instructions is limited to the AES encryption of the counter block (AES-NI) and the Galois multiplication (PCLMULQDQ). This leads to the FPU not being fully utilized for crypto operations. To solve (a) we do crypto processing in larger chunks while owning the FPU. An `icp_gcm_avx_chunk_size` module parameter was introduced to make this chunk size tweakable. It defaults to 32 KiB. This step alone roughly doubles performance. (b) is tackled by porting and using the highly optimized openssl AES-GCM assembler routines, which do all the processing (CTR, AES, GMULT) in a single routine. Both steps together result in up to 32x reduction of the time spend in the en/decryption routines, leading up to approximately 12x throughput increase for large (128 KiB) blocks. Lastly, this commit changes the default encryption algorithm from AES-CCM to AES-GCM when setting the `encryption=on` property. Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-By: Jason King <jason.king@joyent.com> Reviewed-By: Tom Caputi <tcaputi@datto.com> Reviewed-By: Richard Laager <rlaager@wiktel.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #9749	2020-02-10 12:59:50 -08:00
Ryan Moeller	8c4987c489	Restore aclmode and remove acltype on FreeBSD This replaces the placeholder ZFS_PROP_PRIVATE with ZFS_PROP_ACLMODE, matching what is done in the NFSv4 ACLs PR (#9709). On FreeBSD we hide ZFS_PROP_ACLTYPE, while on Linux we hide ZFS_PROP_ACLMODE. The tests already assume this arrangement. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9913	2020-02-04 08:40:07 -08:00
Ned Bass	a3403164d7	zdb: add support for object ranges for zdb -d Allow a range of object identifiers to dump with -d. This may be useful when dumping a large dataset and you want to break it up into multiple phases, or to resume where a previous scan left off. Object type selection flags are supported to reduce the performance overhead of verbosely dumping unwanted objects, and to reduce the amount of post-processing work needed to filter out unwanted objects from zdb output. This change extends existing syntax in a backward-compatible way. That is, the base case of a range is to specify a single object identifier to dump. Ranges and object identifiers can be intermixed as command line parameters. Usage synopsis: Object ranges take the form <start>:<end>[:<flags>] start Starting object number end Ending object number, or -1 for no upper bound flags Optional flags to select object types: A All objects (this is the default) d ZFS directories f ZFS files m SPA space maps z ZAPs - Negate effect of next flag Examples: # Dump all file objects zdb -dd tank/fish 0👎f # Dump all file and directory objects zdb -dd tank/fish 0👎fd # Dump all types except file and directory objects zdb -dd tank/fish 0👎A-f-d # Dump object IDs in a specific range zdb -dd tank/fish 1000:2000 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Ned Bass <bass6@llnl.gov> Closes #9832	2020-01-24 11:00:46 -08:00
Jason King	e2ef1cbf04	Support inheriting properties in channel programs This adds support in channel programs to inherit properties analogous to `zfs inherit` by adding `zfs.sync.inherit` and `zfs.check.inherit` functions to the ZFS LUA API. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jason King <jason.king@joyent.com> Closes #9738	2020-01-22 17:03:17 -08:00
Paul Zuchowski	f12e42cccf	zdb -d should accept the numeric objset id As an alternative to the dataset name, zdb now allows the decimal or hexadecimal objset ID to be specified. When permanent errors are reported as 2 hexadecimal numbers (objset ID : object ID) in zpool status; you can now use 'zdb <pool>[/objset ID] object' to determine the names of the objset and object which have the error. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9733	2020-01-16 09:22:49 -08:00
Richard Laager	f744f36ce5	Document zfs change-key caveats As discussed on the 2019-01-07 OpenZFS Leadership Meeting, we need to be clear about the limitations of `zfs change-key`. Changing the user key does not change the master key, nor does it currently overwrite the old wrapped master key on disk. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Garrett Fields <ghfields@gmail.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Richard Laager <rlaager@wiktel.com> Closes #9819	2020-01-14 10:11:07 -08:00
Brian Behlendorf	e458fcca75	Change http://zfsonlinux.org links to https://zfsonlinux.org Update the project website links contained in to repository to reference the secure https://zfsonlinux.org address. Reviewed-By: Richard Laager <rlaager@wiktel.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Garrett Fields <ghfields@gmail.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9837	2020-01-13 16:43:59 -08:00
Tom Caputi	ba0ba69e50	Add 'zfs send --saved' flag This commit adds the --saved (-S) to the 'zfs send' command. This flag allows a user to send a partially received dataset, which can be useful when migrating a backup server to new hardware. This flag is compatible with resumable receives, so even if the saved send is interrupted, it can be resumed. The flag does not require any user / kernel ABI changes or any new feature flags in the send stream format. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Alek Pinchuk <apinchuk@datto.com> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Reviewed-by: Christian Schwarz <me@cschwarz.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #9007	2020-01-10 10:16:58 -08:00
Brian Behlendorf	33dc49e0c0	Fix ZPOOL_VDEV_NAME_PATH option description The corresponding zpool status option is -P and not -p. Update this description to reference the correct option. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9803	2020-01-06 10:43:32 -08:00
Tony Hutter	9fb2771aa5	Colorize zpool status output If the ZFS_COLOR env variable is set, then use ANSI color output in zpool status: - Column headers are bold - Degraded or offline pools/vdevs are yellow - Non-zero error counters and faulted vdevs/pools are red - The 'status:' and 'action:' sections are yellow if they're displaying a warning. This also includes a new 'faketty' function in libtest.shlib that is compatible with FreeBSD (code provided by @freqlabs). Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #9340	2019-12-19 16:26:07 -08:00
Matthew Macy	4bc721965f	Add FreeBSD jail support hooks Add the 'zfs jail/unjail' subcommands along with the relevant documentation from FreeBSD. This feature is not supported on Linux and still requires the match kernel ioctls which will be included when the FreeBSD platform code is integrated. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9686	2019-12-11 11:58:37 -08:00
Paul Zuchowski	f0bf435176	zio_decompress_data always ASSERTs successful decompression This interferes with zdb_read_block trying all the decompression algorithms when the 'd' flag is specified, as some are expected to fail. Also control the output when guessing algorithms, try the more common compression types first, allow specifying lsize/psize, and fix an uninitialized variable. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9612 Closes #9630	2019-12-10 15:51:58 -08:00
Matthew Macy	f95704ca5e	Disable EDONR on FreeBSD FreeBSD uses its own crypto framework in-kernel which, at this time, has no EDONR implementation. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9664	2019-12-05 13:10:29 -08:00
Brian Behlendorf	624222ae31	Increase allowed 'special_small_blocks' maximum value There may be circumstances where it's desirable that all blocks in a specified dataset be stored on the special device. Relax the artificial 128K limit and allow the special_small_blocks property to be set up to 1M. When blocks >1MB have been enabled via the zfs_max_recordsize module option, this limit is increased accordingly. Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9131 Closes #9355	2019-12-03 09:58:03 -08:00
Paul Zuchowski	894f6696b4	Add display of checksums to zdb -R The function zdb_read_block (zdb -R) was always intended to have a :c flag which would read the DVA and length supplied by the user, and display the checksum. Since we don't know which checksum goes with the data, we should calculate and display them all. For each checksum in the table, read in the data at the supplied DVA:length, calculate the checksum, and display it. Update the man page and create a zfs test for the new feature. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9607	2019-11-27 10:08:18 -08:00
Kjeld Schouten	64c77c4bad	Change zed.service to zfs-zed.service in man page zed.service does not exist replaced with correct service name in man. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl> Closes #9581	2019-11-13 10:23:23 -08:00
Ross Williams	c5ebfbbe19	Reorganize zpool(8) man page into sections Moved subcommand topics into individual manpages. Reordered and grouped the list of subcommands by topic. Moved concepts overview to `zpoolconcepts.8` and the long list of available pool properties to `zpoolprops.8`. Internal cross-references copied from `zpool.8` needed to be converted to `.Xr` external references to new subcommand manual pages. Move `autotrim` into lexical order, autotrim tacked onto the end of a list. Now it is in alphabetical order. Clarify attach/detach description. Description was too specific to command syntax. Overview clarifies reason for attaching or detaching a device. Clarify replace description, don't refer to subcommand arguments. Clarify split command description, say what split actually does and why you'd want to do it. Clarify description of upgrade, and simplify the zpool.8 wording of the zpool-upgrade(8) description. Clarify description of import, detail what zpool-import(8) actually does. Add appropriate SEE ALSO sections. Divided zpool subcommand manual pages need their own SEE ALSO sections. Also modified fsck.zfs.8 to point directly to zfs-scrub.8 and zed.8.in to include a direct reference to zfs-events.8 Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ross Williams <ross@ross-williams.net> Closes #9564	2019-11-13 09:21:07 -08:00
Ross Williams	870fc32fc9	Reorganize zfs(8) man page into sections Most subcommands got their own manpages (e.g. create). Some related commands grouped into a single manpage and symlinks created (e.g. set, get, and inherit). I did this when topics were either too short to warrant their own file or so interrelated that a user would want to refer between commands in the same file. Corrected .Sx internal references to .Xr cross refs; lots of .Sx references from when text was all in zfs.8 needed to be changed to .Xr zfs-$SUBCOMMAND 8 cross references. Divided subcommand list in zfs(8) into sections of related functionality. This required writing new descriptions for some commands. Preserved ".Os Linux", `.Os` macro parsing behavior differs between mandoc from the "BSD" mandoc package (available on Ubuntu) and man from Ubuntu's man-db package, which calls groff to format the manpages. Groff handles the `.Os` macro differently and wrongly, defaulting it to "BSD" in `/usr/share/groff//tmac/mdoc/doc-common`, instead of getting the default from `uname`. A future set of changes will introduce build-time preprocessing of manpages for platform-specific documentation and can insert the correct operating system name. Added SEE ALSO sections, the newly-divided zfs-.8 subcommand man pages needed their own SEE ALSO sections pointing to related subcommands and, in some cases, documentation from other packages (e.g. zfs-share.8). Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ross Williams <ross@ross-williams.net> Closes #9559	2019-11-12 11:17:40 -08:00
Brian Behlendorf	d292307212	Modify sharenfs=on default behavior While it may sometimes be convenient to export an NFS filesystem with no_root_squash it should not be the default behavior. Align the default behavior with the Linux NFS server defaults. To restore the previous behavior use 'zfs set sharenfs="no_root_squash,..."'. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9397 Closes #9425	2019-10-13 19:13:26 -07:00
Richard Yao	803884217f	Implement ZPOOL_IMPORT_UDEV_TIMEOUT_MS Since 0.7.0, zpool import would unconditionally block on udev for 30 seconds. This introduced a regression in initramfs environments that lack udev (particularly mdev based environments), yet use a zfs userland tools intended for the system that had been built against udev. Gentoo's genkernel is the main example, although custom user initramfs environments would be similarly impacted unless special builds of the ZFS userland utilities were done for them. Such environments already have their own mechanisms for blocking until device nodes are ready (such as genkernel's scandelay parameter), so it is unnecessary for zpool import to block on a non-existent udev until a timeout is reached inside of them. Rather than trying to intelligently determine whether udev is available on the system to avoid unnecessarily blocking in such environments, it seems best to just allow the environment to override the timeout. I propose that we add an environment variable called ZPOOL_IMPORT_UDEV_TIMEOUT_MS. Setting it to 0 would restore the 0.6.x behavior that was more desirable in mdev based initramfs environments. This allows the system user land utilities to be reused when building mdev-based initramfs archives. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #9436	2019-10-11 09:52:48 -07:00

1 2 3 4 5 ...

411 Commits