4976 zfs should only avoid writing to a failing non-redundant top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature is enabled
4983 need to collect metaslab information via mdb
4984 device selection should use fragmentation metric
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
References:
https://www.illumos.org/issues/4976https://www.illumos.org/issues/4978https://www.illumos.org/issues/4979https://www.illumos.org/issues/4980https://www.illumos.org/issues/4981https://www.illumos.org/issues/4982https://www.illumos.org/issues/4983https://www.illumos.org/issues/4984https://github.com/illumos/illumos-gate/commit/2e4c998
Notes:
The "zdb -M" option has been re-tasked to display the new metaslab
fragmentation metric and the new "zdb -I" option is used to control
the maximum number of in-flight I/Os.
The new fragmentation metric is derived from the space map histogram
which has been rolled up to the vdev and pool level and is presented
to the user via "zpool list".
Add a number of module parameters related to the new metaslab weighting
logic.
Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2595
Add a new 'overlay' property (default 'off') that controls whether the
filesystem should be mounted even if the mountpoint is busy or if it
should fail with a 'mountpoint not empty'.
Doing overlay mounts is the default mount behavior on Linux, but not
in ZFS. It have been decided that following the ZFS behavior should
be the default, but this overlay allows for site administrator to
override this decision on a per-dataset basis.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #2503
4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/4757https://www.illumos.org/issues/4913https://github.com/illumos/illumos-gate/commit/5d7b4d4
Porting notes:
For compatibility with the fastpath code the zio_done() function
needed to be updated. Because embedded-data block pointers do
not require DVAs to be allocated the associated vdevs will not
be marked and therefore should not be unmarked.
Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2544
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Description from Matt Ahrens's bug report at Delphix:
Add a new zfs property, "redundant_metadata" which can have values
"all" or "most". The default will be "all", which is the current
behavior. Setting to "most" will cause us to only store 1 copy of
level-1 indirect blocks of user data files.
Additional notes:
The new man page section for this property states
"The exact behavior of which metadata blocks
are stored redundantly may change in future releases."
and:
"When set to most, ZFS stores an extra copy of most types of
metadata. This can improve performance of random writes,
because less metadata must be written."
The current implementation is as described above in Matt's blog.
It is controlled by a new global integer
"zfs_redundant_metadata_most_ditto_level", currently initialized
to 2. When "redundant_metadata" is set to "most", only indirect
blocks of the specified level and higher will have additional ditto
blocks created.
Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2542
This functionality is already available in 'zfs get'. Providing
it for 'zpool get' is useful and good for consistency.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #2522
When given a pool name via -e, zdb would attempt an import. If it
failed, then it would attempt a verbatim import. This behavior is
not always desirable so a -V switch is added to zdb to control the
behavior. When specified, a verbatim import is done. Otherwise,
the behavior is as it was previously, except no verbatim import
is done on failure.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2372
Most ZFS implementations seemed to have missed this bit of documentation.
The additional text is based on FreeBSD's man page.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2416
Users need to be aware that when replacing devices in an existing
pool they may need to override automatically detected ashift value.
This will all depend on the exact hardware they are using.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2024
The property name gets mangled with the explanation due to the property
length. Fixed by putting the explanation on the next line.
Before:
unsupported@feature_Info rmation about unsupported features that are
enabled on the pool. See zpool-features(5) for details.
After:
unsupported@feature_guid
Information about unsupported features that are enabled on the pool. See
zpool-features(5) for details.
Signed-off-by: SenH <sen@senhaerens.be>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2419
These options have existed for a long time but have historically
been undocumented because they are not guaranteed to be safe. They
should only be used as a last resort when attempting to recover a
damaged pool.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1130
* Remove the references to share(1M), unshare(1M) and dfstab(4)
since they are not applicable to Linux.
* Add the exact exportfs command line used when setting sharenfs=on.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue: #1641
Users need to be aware that when adding devices to an existing pool
they may need to override automatically detected ashift value.
This will all depend on the exact hardware they are using.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #2024
zed monitors ZFS events. When a zevent is posted, zed will run any
scripts that have been enabled for the corresponding zevent class.
Multiple scripts may be invoked for a given zevent. The zevent
nvpairs are passed to the scripts as environment variables.
Events are processed synchronously by the single thread, and there is
no maximum timeout for script execution. Consequently, a misbehaving
script can delay (or forever block) the processing of subsequent
zevents. Plans are to address this in future commits.
Initial scripts have been developed to log events to syslog
and send email in response to checksum/data/io errors and
resilver.finish/scrub.finish events. By default, email will only
be sent if the ZED_EMAIL variable is configured in zed.rc (which is
serving as a config file of sorts until a proper configuration file
is implemented).
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
Originally, users had to handle spa namespace collisions by either
exporting the already imported pool or by specifying a new name for the
pool with a conflicting name. In the case of root pools from virtual
guests, neither approach to collision resolution is reasonable. This is
addressed by extending the new name syntax with a -t option to specify
that the new name is temporary. When specified, this sets an internal
flag that is passed into the kernel to tell it that all label updates
should refer to the name used in the original label. Consequently, the
original pool name will be retained on export.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2189
In Debian GNU/Linux a program called 'linitian' is used to make sure
that packages conforms to the Debian GNU/Linux packaging guide lines.
This fixes the problem reported as:
W: zfsutils: manpage-has-bad-whatis-entry usr/share/man/man1/zhack.1.gz
W: zfsutils: manpage-has-bad-whatis-entry usr/share/man/man8/fsck.zfs.8.gz
Not something that ZoL needs to addhere to, but every other man page
have their NAME section in a special way - why not these two as well?
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2161
Add the "relatime" property. When set to "on", a file's atime will only
be updated if the existing atime at least a day old or if the existing
ctime or mtime has been updated since the last access. This behavior
is compatible with the Linux "relatime" mount option.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2064Closes#1917
Setting the 'dist_' prefix is the correct way to instruct Automake
to include these files in the distribution. The EXTRA_DIST variable
is reserved for files which are not covered by the automatic rules.
http://www.gnu.org/software/automake/manual/automake.html#Basics
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Four new dataset properties have been added to support SELinux. They
are 'context', 'fscontext', 'defcontext' and 'rootcontext' which map
directly to the context options described in mount(8). When one of
these properties is set to something other than 'none'. That string
will be passed verbatim as a mount option for the given context when
the filesystem is mounted.
For example, if you wanted the rootcontext for a filesystem to be set
to 'system_u:object_r:fs_t' you would set the property as follows:
$ zfs set rootcontext="system_u:object_r:fs_t" storage-pool/media
This will ensure the filesystem is automatically mounted with that
rootcontext. It is equivalent to manually specifying the rootcontext
with the -o option like this:
$ zfs mount -o rootcontext=system_u:object_r:fs_t storage-pool/media
By default all four contexts are set to 'none'. Further information
on SELinux contexts is detailed in mount(8) and selinux(8) man pages.
Signed-off-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#1504
The Snapshots section of the zfs(8) man page is incorrect and should
have been updated as part of #1312. Snapshots of volumes can be
accessed independently and their visibility is determined by the
'snapdev=hidden|visible' property. This is analogous to the existing
'snapdir=hidden|visible' property.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#1921
Commit 0cee240 from FreeBSD dramatically speeds up 'zfs list'
performance assuming you're only interested in the dataset
names. This optimization should be mentioned in the man page
to allow end users to take advantage of it.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1847
3588 provide zfs properties for logical (uncompressed) space
used and referenced
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Reviewed by: Richard Elling <richard.elling@dey-sys.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
References:
https://www.illumos.org/issues/3588illumos/illumos-gate@77372cb0f3
Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This change adds support for Posix ACLs by storing them as an xattr
which is common practice for many Linux file systems. Since the
Posix ACL is stored as an xattr it will not overwrite any existing
ZFS/NFSv4 ACLs which may have been set. The Posix ACL will also
be non-functional on other platforms although it may be visible
as an xattr if that platform understands SA based xattrs.
By default Posix ACLs are disabled but they may be enabled with
the new 'aclmode=noacl|posixacl' property. Set the property to
'posixacl' to enable them. If ZFS/NFSv4 ACL support is ever added
an appropriate acltype will be added.
This change passes the POSIX Test Suite cleanly with the exception
of xacl/00.t test 45 which is incorrect for Linux (Ext4 fails too).
http://www.tuxera.com/community/posix-test-suite/
Signed-off-by: Massimo Maggi <me@massimo-maggi.eu>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#170
Extend the xattr property section of zfs(8) such that it covers
both styles of supported xattr. A short discussion of the benefits
and drawbacks of each type is presented to allow users to make an
informed choice.
Signed-off-by: Massimo Maggi <me@massimo-maggi.eu>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #170
This works the same as the -p switch to "zfs get", displaying full
resolution values for appropriate attributes.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1813
Document the "-D" and "-T" options and the optional interval
and count or "zpool status".
Also for zpool's man page, use a consistent order for the
various "-T" options to match the program's help output.
Document the effect of additional "-D" options for zdb.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1786
The detach section of the zpool(8) man page now suggests the
offline command. Using offline may be more appropriate for
certain situations.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1776
2882 implement libzfs_core
2883 changing "canmount" property to "on" should not always remount dataset
2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>
References:
https://www.illumos.org/issues/2882https://www.illumos.org/issues/2883https://www.illumos.org/issues/2900illumos/illumos-gate@4445fffbbb
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1293
Porting notes:
WARNING: This patch changes the user/kernel ABI. That means that
the zfs/zpool utilities built from master are NOT compatible with
the 0.6.2 kernel modules. Ensure you load the matching kernel
modules from master after updating the utilities. Otherwise the
zfs/zpool commands will be unable to interact with your pool and
you will see errors similar to the following:
$ zpool list
failed to read pool configuration: bad address
no pools available
$ zfs list
no datasets available
Add zvol minor device creation to the new zfs_snapshot_nvl function.
Remove the logging of the "release" operation in
dsl_dataset_user_release_sync(). The logging caused a null dereference
because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the
logging functions try to get the ds name via the dsl_dataset_name()
function. I've got no idea why this particular code would have worked
in Illumos. This code has subsequently been completely reworked in
Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring).
Squash some "may be used uninitialized" warning/erorrs.
Fix some printf format warnings for %lld and %llu.
Apply a few spa_writeable() changes that were made to Illumos in
illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and
3115 fixes.
Add a missing call to fnvlist_free(nvl) in log_internal() that was added
in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time
(zfsonlinux/zfs@9e11c73) because it depended on future work.
The help output of for zfs set/get says that sync can be one of
standard | always | disabled
but the man pages claim it can be
sync=default | always | disabled
the accepted value is standard, this changes the manpage to give the
correct values.
Signed-off-by: Steven Burgess <sburgess@dattobackup.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1634
The manpage reports fletcher2, but in zio.h ZIO_CHECKSUM_ON_VALUE
is defined to ZIO_CHECKSUM_FLETCHER_4.
Signed-off-by: Massimo Maggi <me@massimo-maggi.eu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1628
zpool list has the same options for repeating as zpool iostat
has, but that is not documented. This patch adds the documentation.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The FreeBSD implementation of zfs adds the 'zpool labelclear'
command. Since this functionality is helpful and straight
forward to add it is being included in ZoL.
References:
freebsd/freebsd@119a041dc9
Ported-by: Dmitry Khasanov <pik4ez@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1126
Adds zpool split documentation to the zpool man page. I only documented
the options that I could get to work. While it is documented on some
sun blogs that devices can be specified for split, I was not able to get
that to work during my testing.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1456
3306 zdb should be able to issue reads in parallel
3321 'zpool reopen' command should be documented in the man
page and help
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
References:
illumos/illumos-gate@31d7e8fa33https://www.illumos.org/issues/3306https://www.illumos.org/issues/3321
The vdev_file.c implementation in this patch diverges significantly
from the upstream version. For consistenty with the vdev_disk.c
code the upstream version leverages the Illumos bio interfaces.
This makes sense for Illumos but not for ZoL for two reasons.
1) The vdev_disk.c code in ZoL has been rewritten to use the
Linux block device interfaces which differ significantly
from those in Illumos. Therefore, updating the vdev_file.c
to use the Illumos interfaces doesn't get you consistency
with vdev_disk.c.
2) Using the upstream patch as is would requiring implementing
compatibility code for those Solaris block device interfaces
in user and kernel space. That additional complexity could
lead to confusion and doesn't buy us anything.
For these reasons I've opted to simply move the existing vn_rdwr()
as is in to the taskq function. This has the advantage of being
low risk and easy to understand. Moving the vn_rdwr() function
in to its own taskq thread also neatly avoids the possibility of
a stack overflow.
Finally, because of the additional work which is being handled by
the free taskq the number of threads has been increased. The
thread count under Illumos defaults to 100 but was decreased to 2
in commit 08d08e due to contention. We increase it to 8 until
the contention can be address by porting Illumos #3581.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1354
* Update manpage with more information about the ACL, guest access
and that samba needs to be able to authenticate user(s).
* Add information that 'net' can be used to modify the share after
ZFS sharing and that it will be undone with a 'zfs unshare'.
* Give an example on how to mount a SMB filesystem shared via ZFS.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1181
Issue #1170
The automake templates have been updated to install them,
and the existing packaging will automatically include them.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Supplements #518
Detected by rpmlint the 'rpool/export/home' section was being
interpretted by troff as an undefined macro. This resulted
in the 'rpool/export/home' output being omitted from 'man zdb'.
This was caused by starting the line with a ' character. By
moving the 'in' down to the next line we're able to fix it.
zfs.x86_64: W: manual-page-warning /usr/share/man/man8/zdb.8.gz
450: warning: macro `rpool/export/home'' not defined
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The new snapdev dataset property may be set to control the
visibility of zvol snapshot devices. By default this value
is set to 'hidden' which will prevent zvol snapshots from
appearing under /dev/zvol/ and /dev/<dataset>/. When set to
'visible' all zvol snapshots for the dataset will be visible.
This functionality was largely added because when automatic
snapshoting is enabled large numbers of read-only zvol snapshots
will be created. When creating these devices the kernel will
attempt to read their partition tables, and blkid will attempt
to identify any filesystems on those partitions. This leads
to a variety of issues:
1) The zvol partition tables will be read in the context of
the `modprobe zfs` for automatically imported pools. This
is undesirable and should be done asynchronously, but for
now reducing the number of visible devices helps.
2) Udev expects to be able to complete its work for a new
block devices fairly quickly. When many zvol devices are
added at the same time this is no longer be true. It can
lead to udev timeouts and missing /dev/zvol links.
3) Simply having lots of devices in /dev/ can be aukward from
a management standpoint. Hidding the devices your unlikely
to ever use helps with this. Any snapshot device which is
needed can be made visible by changing the snapdev property.
NOTE: This patch changes the default behavior for zvols which
was effectively 'snapdev=visible'.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1235Closes#945
Issue #956
Issue #756
`zpool status -x` should only flag errors or where the pool is
unavailable. If it imported fine but isn't using the latest features
available in the code, that's not an error.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1319
As of 0.6.0-rc11 using ZFS volumes as Linux swap devices is
supported. Swapping to files in ZFS filesystems is not.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1189
3035 LZ4 compression support in ZFS and GRUB
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Christopher Siden <csiden@delphix.com>
References:
illumos/illumos-gate@a6f561b4aehttps://www.illumos.org/issues/3035http://wiki.illumos.org/display/illumos/LZ4+Compression+In+ZFS
This patch has been slightly modified from the upstream Illumos
version to be compatible with Linux. Due to the very limited
stack space in the kernel a lz4 workspace kmem cache is used.
Since we are using gcc we are also able to take advantage of the
gcc optimized __builtin_ctz functions.
Support for GRUB has been dropped from this patch. That code
is available but those changes will need to made to the upstream
GRUB package.
Lastly, several hunks of dead code were dropped for clarity. They
include the functions real_LZ4_uncompress(), LZ4_compressBound()
and the Visual Studio specific hunks wrapped in _MSC_VER.
Ported-by: Eric Dillmann <eric@jave.fr>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1217
* Move -R option up one position in the list to match
the Illumos documentation.
* Move -D option up one position and refreshed it to
match the Illumos documentation.
* Move -p option up one position and refreshed it to
match the Illumos documentation.
* Add the -n, -P documentation found in zfs receive
in to zfs send where to belongs.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1187