Commit Graph

82 Commits

Author SHA1 Message Date
Christian Kohlschütter df30f56639 Add "ashift" property to zpool create
Some disks with internal sectors larger than 512 bytes (e.g., 4k) can
suffer from bad write performance when ashift is not configured
correctly.  This is caused by the disk not reporting its actual sector
size, but a sector size of 512 bytes.  The drive may behave this way
for compatibility reasons.  For example, the WDC WD20EARS disks are
known to exhibit this behavior.

When creating a zpool, ZFS takes that wrong sector size and sets the
"ashift" property accordingly (to 9: 1<<9=512), whereas it should be
set to 12 for 4k sectors (1<<12=4096).

This patch allows an adminstrator to manual specify the known correct
ashift size at 'zpool create' time.  This can significantly improve
performance in certain cases.  However, it will have an impact on your
total pool capacity.  See the updated ashift property description
in the zpool.8 man page for additional details.

Valid values for the ashift property range from 9 to 17 (512B-128KB).
Additionally, you may set the ashift to 0 if you wish to auto-detect
the sector size based on what the disk reports, this is the default
behavior.  The most common ashift values are 9 and 12.

  Example:
  zpool create -o ashift=12 tank raidz2 sda sdb sdc sdd

Closes #280

Original-patch-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-06-17 16:35:49 -07:00
Brian Behlendorf e130330a87 Handle /etc/mtab -> /proc/mounts symlink
Under Fedora 15 /etc/mtab is now a symlink to /proc/mounts by
default.  When /etc/mtab is a symlink the mount.zfs helper
should not update it.   There was code in place to handle this
case but it used stat() which traverses the link and then issues
the stat on /proc/mounts.  We need to use lstat() to prevent the
link traversal and instead stat /etc/mtab.

Closes #270
2011-06-14 16:48:38 -07:00
Brian Behlendorf 2e08aedba4 Always check -Wno-unused-but-set-variable gcc support
The previous commit 8a7e1ceefa wasn't
quite right.  This check applies to both the user and kernel space
build and as such we must make sure it runs regardless of what
the --with-config option is set too.

For example, if --with-config=kernel then the autoconf test does
not run and we generate build warnings when compiling the kernel
packages.
2011-06-14 16:40:35 -07:00
Brian Behlendorf 8a7e1ceefa Check for -Wno-unused-but-set-variable gcc support
Gcc versions 4.3.2 and earlier do not support the compiler flag
-Wno-unused-but-set-variable.  This can lead to build failures
on older Linux platforms such as Debian Lenny.  Since this is
an optional build argument this changes add a new autoconf check
for the option.  If it is supported by the installed version of
gcc then it is used otherwise it is omited.

See commit's 12c1acde76 and
79713039a2 for the reason the
-Wno-unused-but-set-variable options was originally added.
2011-06-14 14:43:22 -07:00
Brian Behlendorf 4804b739e1 Default to internal 'zfs userspace' implementation
We will never bring over the pyzfs.py helper script from Solaris
to Linux.  Instead the missing functionality will be directly
integrated in to the zfs commands and libraries.  To avoid
confusion remove the warning about the missing pyzfs.py utility
and simply use the default internal support.

The Illumous developers are of the same mind and have proposed an
initial patch to do this which has been integrated in to the 'allow'
development branch.  After some additional testing this code
can be merged in to master as the right long term solution.
2011-05-20 10:25:41 -07:00
Brian Behlendorf 6ee44e32be Fix awk usage
The zpool_id and zpool_layout helper scripts have been updated to
use the more common /usr/bin/awk symlink.  On Fedora/Redhat systems
there are both /bin/awk and /usr/bin/awk symlinks to your installed
version of awk.  On Debian/Ubuntu systems only the /usr/bin/awk
symlink exists.

Additionally, add the '\<' token to the beginning of the regex
pattern to prevent partial matches.  This pattern only appears to
work with gawk despite the mawk man page claiming to support this
extended regex.  Thus you will need to have gawk installed to use
these optional helper scripts.  A comment has been added to the
script to reflect this reality.
2011-05-06 10:16:04 -07:00
Brian Behlendorf 3613204cd7 Allow mounting of read-only snapshots
With the addition of the mount helper we accidentally regressed
the ability to manually mount snapshots.  This commit updates
the mount helper to expect the possibility of a ZFS_TYPE_SNAPSHOT.
All snapshot will be automatically treated as 'legacy' type mounts
so they can be mounted manually.
2011-05-05 10:13:38 -07:00
Brian Behlendorf df554c148e Fix 'zfs set volsize=N pool/dataset'
This change fixes a kernel panic which would occur when resizing
a dataset which was not open.  The objset_t stored in the
zvol_state_t will be set to NULL when the block device is closed.
To avoid this issue we pass the correct objset_t as the third arg.

The code has also been updated to correctly notify the kernel
when the block device capacity changes.  For 2.6.28 and newer
kernels the capacity change will be immediately detected.  For
earlier kernels the capacity change will be detected when the
device is next opened.  This is a known limitation of older
kernels.

Online ext3 resize test case passes on 2.6.28+ kernels:
$ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023
$ zpool create tank /tmp/zvol
$ zfs create -V 500M tank/zd0
$ mkfs.ext3 /dev/zd0
$ mkdir /mnt/zd0
$ mount /dev/zd0 /mnt/zd0
$ df -h /mnt/zd0
$ zfs set volsize=800M tank/zd0
$ resize2fs /dev/zd0
$ df -h /mnt/zd0

Original-patch-by: Fajar A. Nugraha <github@fajar.net>
Closes #68
Closes #84
2011-05-02 08:54:40 -07:00
Gunnar Beutner 055656d4f4 Implemented NFS export_operations.
Implemented the required NFS operations for exporting ZFS datasets
using the in-kernel NFS daemon.
2011-04-29 12:36:13 -07:00
Darik Horn 492b8e9e7b Use gethostid in the Linux convention.
Disable the gethostid() override for Solaris behavior because Linux systems
implement the POSIX standard in a way that allows a negative result.

Mask the gethostid() result to the lower four bytes, like coreutils does in
/usr/bin/hostid, to prevent junk bits or sign-extension on systems that have an
eight byte long type. This can cause a spurious hostid mismatch that prevents
zpool import on 64-bit systems.
2011-04-25 10:36:17 -05:00
Brian Behlendorf 12c1acde76 Set -Wno-unused-but-set-variable globally
As of gcc-4.6 the option -Wunused-but-set-variable is enabled by
default.  While this is a useful warning there are numerous places
in the ZFS code when a variable is set and then only checked in an
ASSERT().  To avoid having to update every instance of this in the
code we now set -Wno-unused-but-set-variable to suppress the warning.

Additionally, when building with --enable-debug and -Werror set these
warning also become fatal.  We can reevaluate the suppression of these
error at a later time if it becomes an issue.  For now we are basically
just reverting to the previous gcc behavior.
2011-04-19 10:44:10 -07:00
Brian Behlendorf 03514b0110 Fix gcc compiler warning, parse_option()
When compiling ZFS in user space gcc-4.6.0 correctly identifies
the variable 'value' as being set but never used.  This generates a
warning and a build failure when using --enable-debug.  Once again
this is correct but I'm reluctant to remove 'value' because we are
breaking the string in to name/value pairs.  While it is not used
now there's a good chance it will be soon and I'd rather not have
to reinvent this.  To suppress the warning with just as a VERIFY().
This was observed under Fedora 15.

  cmd/mount_zfs/mount_zfs.c: In function ‘parse_option’:
  cmd/mount_zfs/mount_zfs.c:112:21: error: variable ‘value’ set but not
  used [-Werror=unused-but-set-variable]
2011-04-19 09:04:51 -07:00
Manuel Amador (Rudd-O) 8610b52bd4 Added .gitignore for mount.zfs and zvol_id
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-04-07 10:28:38 -07:00
Ned Bass fa417e57a6 Call udevadm trigger more safely
Some udev hooks are not designed to be idempotent, so calling udevadm
trigger outside of the distribution's initialization scripts can have
unexpected (and potentially dangerous) side effects.  For example, the
system time may change or devices may appear multiple times.  See Ubuntu
launchpad bug 320200 and this mailing list post for more details:

https://lists.ubuntu.com/archives/ubuntu-devel/2009-January/027260.html

To avoid these problems we call udevadm trigger with --action=change
--subsystem-match=block.  The first argument tells udev just to refresh
devices, and make sure everything's as it should be.  The second
argument limits the scope to block devices, so devices belonging to
other subsystems cannot be affected.

This doesn't fix the problem on older udev implementations that don't
provide udevadm but instead have udevtrigger as a standalone program.
In this case the above options aren't available so there's no way to
call call udevtrigger safely.  But we can live with that since this
issue only exists in optional test and helper scripts, and most
zfs-on-linux users are running newer systems anyways.
2011-04-05 13:00:51 -07:00
Fajar A. Nugraha a5729f7b22 Fixes to enable zvol symlink creation
This commit fixes issue on
https://github.com/behlendorf/zfs/issues/#issue/172
Changes:
- update BLKZNAME to use _IOR instead of _IO.  Kernel 2.6.32 allows
read parameters (copy_to_user) with _IO, while newer kernels (tested
Archlinux's 2.6.37 kernel) enforces _IOR (which is correct)
- fix return code and message on error

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-03-24 11:48:18 -07:00
Brian Behlendorf bdf4328b04 Linux 2.6.28 compat, insert_inode_locked()
Added insert_inode_locked() helper function, prior to this most callers
used insert_inode_hash().  The older method doesn't check for collisions
in the inode_hashtable but it still acceptible for use.  Fallback to
using insert_inode_hash() when insert_inode_locked() is unavailable.
2011-03-22 12:15:54 -07:00
Brian Behlendorf f47c42e214 Merge branch 'dracut' 2011-03-22 12:13:04 -07:00
Brian Behlendorf 716895b161 Fix 'LDFLAGS=-Wl,--as-needed' build error
Compiling with 'LDFLAGS=-Wl,--as-needed' exposed the fact that
there were some library linking problems introduced by mount_zfs.
In particular, the libzfs library does use nvpair symbols, and
mount_zfs contains no dependencies on libzpool.

Closes #161
Closes #162
2011-03-18 14:47:19 -07:00
Brian Behlendorf ec49a5f0ec Fix getcwd() warning
New versions glibc declare getcwd() with the warn_unused_result attribute.
This results in a warning because the updated mount helper was not
checking this return value.  This issue was fixed by checking the return
type and in the case of an error simply returning the passed dataset.
One possible, but unlikely, error would be having your cwd directory
unlinked while the mount command was running.

  cmd/mount_zfs/mount_zfs.c: In function ‘parse_dataset’:
  cmd/mount_zfs/mount_zfs.c:223:2: error: ignoring return value of
      ‘getcwd’, declared with attribute warn_unused_result
2011-03-18 13:54:49 -07:00
Brian Behlendorf 01c0e61da0 Add init scripts
To support automatically mounting your zfs on filesystem on boot
a basic init script is needed.  Unfortunately, every distribution
has their own idea of the _right_ way to do things.  Rather than
write one very complicated portable init script, which would be
invariably replaced by the distributions own anyway.  I have
instead added support to provide multiple distribution specific
init scripts.

The correct init script for your distribution will be selected
by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT.
During 'make install' the correct script for your system will
be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the
usual /etc/init.d/zfs location.

Currently, there is zfs.fedora and a more generic zfs.lsb init
script.  Hopefully, the distribution maintainers who know best
how they want their init scripts to function will feedback their
approved versions to be included in the project.

This change does not consider upstart jobs but I'm not at all
opposed to add that sort of thing.
2011-03-17 16:51:54 -07:00
Brian Behlendorf 3aff775555 Strip 'zfsutil,remount' from /etc/mtab
When updating /etc/mtab we should be careful and strip certain
options.  In particular, we need to strip 'zfsutil' because if
we don't the mount utility will helpfull provide it to the
mount helper when we issue mount(8) again.  This subverts the
check that the caller is zfs(8) and not mount(8).
2011-03-15 13:33:29 -07:00
Brian Behlendorf 093aa69286 Always allow '-o remount,ro'
Allow the mount(8) utility to always operate on all datasets when
remounting them read-only.  This critical for rc.sysinit/umountroot
which remounts the root filesystem read-only during shutdown to
ensure everything is correctly flushed to disk.

Fix minor typo, the check to set zfsutil should use the bitwise
'&'.  I must have accidentally hit the adjacent '*' and obviously
neither the compiler or my code review caught this.  Fix it now.
2011-03-15 13:33:29 -07:00
Brian Behlendorf a6cba65cca Check for trailing '/' in mount.zfs
When run with a root '/' cwd the mount.zfs helper would strip not
only the '/' but also the next character from the dataset name.
For example, '/tank' was changed to 'ank' instead of just 'tank'.
Originally, this was done for the '/tmp' cwd case where we needed
to strip the '/' following the cwd.  For example '/tmp/tank' needed
to remove the '/tmp' cwd plus 1 character for the '/'.

This change fixes the problem by checking the cwd and if it ends in
a '/' it does not strip and extra character.  Otherwise it will strip
the next character.  I believe this should only ever be true for the
root directory.

Closes #148
2011-03-10 12:58:44 -08:00
Brian Behlendorf d53368f675 Fix mount helper
Several issues related to strange mount/umount behavior were reported
and this commit should address most of them.  The original idea was
to put in place a zfs mount helper (mount.zfs).  This helper is used
to enforce 'legacy' mount behavior, and perform any extra mount argument
processing (selinux, zfsutil, etc).  This helper wasn't ready for the
0.6.0-rc1 release but with this change it's functional but needs to
extensively tested.

This change addresses the following open issues.
Closes #101
Closes #107
Closes #113
Closes #115
Closes #119
2011-03-09 15:26:48 -08:00
Brian Behlendorf 321a498b95 Add xvattr support
With the removal of the minimal xvattr support from the spl this
support needs to be replaced in the zfs package.  This is fairly
easily accomplished by directly adding portions of the sys/vnode.h
header from OpenSolaris.  These xvattr additions have been placed
in the sys/xvattr.h header file and included as needed where simply
a sys/vnode.h was included before.

In additon to the xvattr types and helper macros two functions
were also included.  The xva_init() and xva_getxoptattr() functions
were included as static inline functions in xvattr.h.  They are
simple enough and it was simpler to place them here rather than
in their own .c file.
2011-03-02 11:43:50 -08:00
Fajar A. Nugraha 4c0d8e50b9 Use udev to create /dev/zvol/[dataset_name] links
This commit allows zvols with names longer than 32 characters, which
fixes issue on https://github.com/behlendorf/zfs/issues/#issue/102.

Changes include:
- use /dev/zd* device names for zvol, where * is the device minor
  (include/sys/fs/zfs.h, module/zfs/zvol.c).
- add BLKZNAME ioctl to get dataset name from userland
  (include/sys/fs/zfs.h, module/zfs/zvol.c, cmd/zvol_id).
- add udev rule to create /dev/zvol/[dataset_name] and the legacy
  /dev/[dataset_name] symlink. For partitions on zvol, it will create
  /dev/zvol/[dataset_name]-part* (etc/udev/rules.d/60-zvol.rules,
  cmd/zvol_id).

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-02-25 09:43:19 -08:00
Brian Behlendorf 718d77f622 Fix uninitialized variable
It was possible for rc to be unitialized in the parse_options()
function which triggered a compiler warning.  Ensure rc is always
initialized.
2011-02-23 12:57:25 -08:00
Brian Behlendorf 45066d1f20 Linux 2.6.38 compat, blkdev_get_by_path()
The open_bdev_exclusive() function has been replaced (again) by the
more generic blkdev_get_by_path() function.  Additionally, the
counterpart function close_bdev_exclusive() has been replaced by
blkdev_put().  Because these functions are more generic versions
of the functions they replaced the compatibility macro must add
the FMODE_EXCL mask to ensure they are exclusive.

Closes #114
2011-02-23 12:29:38 -08:00
Brian Behlendorf 2c395def27 Linux 2.6.36 compat, sops->evict_inode()
The new prefered inteface for evicting an inode from the inode cache
is the ->evict_inode() callback.  It replaces both the ->delete_inode()
and ->clear_inode() callbacks which were previously used for this.
2011-02-11 13:47:51 -08:00
Brian Behlendorf 7268e1bec8 Linux 2.6.35 compat, fops->fsync()
The fsync() callback in the file_operations structure used to take
3 arguments.  The callback now only takes 2 arguments because the
dentry argument was determined to be unused by all consumers.  To
handle this a compatibility prototype was added to ensure the right
prototype is used.  Our implementation never used the dentry argument
either so it's just a matter of using the right prototype.
2011-02-11 09:05:51 -08:00
Brian Behlendorf 777d4af891 Linux 2.6.35 compat, const struct xattr_handler
The const keyword was added to the 'struct xattr_handler' in the
generic Linux super_block structure.  To handle this we define an
appropriate xattr_handler_t typedef which can be used.  This was
the preferred solution because it keeps the code clean and readable.
2011-02-10 16:29:00 -08:00
Brian Behlendorf afffb5cd10 MS_DIRSYNC and MS_REC compat
It turns out that older versions of the glibc headers do not
properly define MS_DIRSYNC despite it being explicitly mentioned
in the man pages.  They instead call it S_WRITE, so for system
where this is not correct defined map MS_DIRSYNC to S_WRITE.
At the time of this commit both Ubuntu Lucid, and Debian Squeeze
both use the out of date glibc headers.

As for MS_REC this field is also not available in the older headers.
Since there is no obvious mapping in this case we simply disable
the recursive mount option which used it.
2011-02-10 12:14:57 -08:00
Brian Behlendorf 1ac0ea38a5 Add missing -ldl linker option
The inclusion on dlsym(), dlopen(), and dlclose() symbols require
us to link against the dl library.  Be careful to add the flag to
both the libzfs library and the commands which depend on the library.
2011-02-10 11:05:44 -08:00
Brian Behlendorf b4ead57cfb Remove HAVE_ZPL from commands and libraries
Thanks to the previous few commits we can now build all of the
user space commands and libraries with support for the zpl.
2011-02-04 16:14:34 -08:00
Brian Behlendorf 9a616b5d17 Documentation updates
Minor Linux specific documentation updates to the comments and
man pages.
2011-02-04 16:14:34 -08:00
Brian Behlendorf c5d915f423 Minimal libshare infrastructure
ZFS even under Solaris does not strictly require libshare to be
available.  The current implementation attempts to dlopen() the
library to access the needed symbols.  If this fails libshare
support is simply disabled.

This means that on Linux we only need the most minimal libshare
implementation.  In fact just enough to prevent the build from
failing.  Longer term we can decide if we want to implement a
libshare library like Solaris.  At best this would be an abstraction
layer between ZFS and NFS/SMB.  Alternately, we can drop libshare
entirely and directly integrate ZFS with Linux's NFS/SMB.

Finally the bare bones user-libshare.m4 test was dropped.  If we
do decide to implement libshare at some point it will surely be
as part of this package so the check is not needed.
2011-02-04 16:14:29 -08:00
Brian Behlendorf 3fb1fcdea1 Add 'zfs mount' support
By design the zfs utility is supposed to handle mounting and unmounting
a zfs filesystem.  We could allow zfs to do this directly.  There are
system calls available to mount/umount a filesystem.  And there are
library calls available to manipulate /etc/mtab.  But there are a
couple very good reasons not to take this appraoch... for now.

Instead of directly calling the system and library calls to (u)mount
the filesystem we fork and exec a (u)mount process.  The principle
reason for this is to delegate the responsibility for locking and
updating /etc/mtab to (u)mount(8).  This ensures maximum portability
and ensures the right locking scheme for your version of (u)mount
will be used.  If we didn't do this we would have to resort to an
autoconf test to determine what locking mechanism is used.

The downside to using mount(8) instead of mount(2) is that we lose
the exact errno which was returned by the kernel.  The return code
from mount(8) provides some insight in to what went wrong but it
not quite as good.  For the moment this is translated as a best
guess in to a errno for the higher layers of zfs.

In the long term a shared library called libmount is under development
which provides a common API to address the locking and errno issues.
Once the standard mount utility has been updated to use this library
we can then leverage it.  Until then this is the only safe solution.

  http://www.kernel.org/pub/linux/utils/util-linux/libmount-docs/index.html
2011-02-04 16:11:58 -08:00
Brian Behlendorf 95c4cae39f Disable umount.zfs helper
For the moment, the only advantage in registering a umount helper
would be to automatically unshare a zfs filesystem.  Since under
Linux this would be unexpected (but nice) behavior there is no
harm in disabling it.

This is desirable because the 'zfs unmount' path invokes the system
umount.  This is done to ensure correct mtab locking but has the
side effect that the umount.zfs helper would be called if it exists.
By default this helper calls back in to zfs to do the unmount on
Solaris which we don't want under Linux.

Once libmount is available and we have a safe way to correctly
lock and update the /etc/mtab file we can reconsider the need
for a umount helper.  Using libmount is the prefered solution.
2011-01-28 12:47:57 -08:00
Brian Behlendorf 3b8cfee8af Enable mount.zfs helper
While not strictly required to mount a zfs filesystem using a
mount helper has certain advantages.

First, we need it if we want to honor the mount behavior as found
on Solaris.  As part of the mount we need to validate that the
dataset has the legacy mount property set if we are using 'mount'
instead of 'zfs mount'.

Secondly, by using a mount helper we can automatically load the
zpl kernel module.  This way you can just issue a 'mount' or
'zfs mount' and it will just work.

Finally, it gives us common hook in user space to add any zfs
specific mount options we might want.  At the moment we don't
have any but now the infrastructure is at least in place.
2011-01-28 12:47:57 -08:00
Brian Behlendorf b3259b6a2b Autoconf selinux support
If libselinux is detected on your system at configure time link
against it.  This allows us to use a library call to detect if
selinux is enabled and if it is to pass the mount option:

  "context=\"system_u:object_r:file_t:s0"

For now this is required because none of the existing selinux
policies are aware of the zfs filesystem type.  Because of this
they do not properly enable xattr based labeling even though
zfs supports all of the required hooks.

Until distro's add zfs as a known xattr friendly fs type we
must use mntpoint labeling.  Alternately, end users could modify
their existing selinux policy with a little guidance.
2011-01-28 12:45:19 -08:00
Brian Behlendorf 149e873ab1 Fix minor compiler warnings
These compiler warnings were introduced when code which was
previously #ifdef'ed out by HAVE_ZPL was re-added for use
by the posix layer.  All of the following changes should be
obviously correct and will cause no semantic changes.
2011-01-06 15:04:28 -08:00
Ricardo M. Correia 8d4e8140ef Fix block device-related issues in zdb.
Specifically, this fixes the two following errors in zdb when a pool
is composed of block devices:

1) 'Value too large for defined data type' when running 'zdb <dataset>'.
2) 'character device required' when running 'zdb -l <block-device>'.

Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-12-14 09:52:46 -08:00
Ned Bass 3ee56c292b Make rollbacks fail gracefully
Support for rolling back datasets require a functional ZPL, which we currently
do not have.  The zfs command does not check for ZPL support before attempting
a rollback, and in preparation for rolling back a zvol it removes the minor
node of the device.  To prevent the zvol device node from disappearing after a
failed rollback operation, this change wraps the zfs_do_rollback() function in
an #ifdef HAVE_ZPL and returns ENOSYS in the absence of a ZPL.  This is
consistent with the behavior of other ZPL dependent commands such as mount.

The orginal error message observed with this bug was rather confusing:

    internal error: Unknown error 524
    Aborted

This was because zfs_ioc_rollback() returns ENOTSUP if we don't HAVE_ZPL, but
Linux actually has no such error code.  It should instead return EOPNOTSUPP, as
that is how ENOTSUP is defined in user space.  With that we would have gotten
the somewhat more helpful message

    cannot rollback 'tank/fish': unsupported version

This is rather a moot point with the above changes since we will no longer make
that ioctl call without a ZPL.  But, this change updates the error code just in
case.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-11-08 14:03:36 -08:00
Ned Bass d877ac6bfe Fix intermittent 'zpool add' failures
Creating whole-disk vdevs can intermittently fail if a udev-managed symlink to
the disk partition is already in place.  To avoid this, we now remove any such
symlink before partitioning the disk.  This makes zpool_label_disk_wait() truly
wait for the new link to show up instead of returning if it finds an old link
still in place.  Otherwise there is a window between when udev deletes and
recreates the link during which access attempts will fail with ENOENT.

Also, clean up a comment about waiting for udev to create symlinks.  It no
longer needs to describe the special cases for the link names, since that is
now handled in a separate helper function.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-10-22 12:38:58 -07:00
Ned Bass 4682b8c14e Remove solaris-specific code from make_leaf_vdev()
Portability between Solaris and Linux isn't really an issue for us anymore, and
removing sections like this one helps simplify the code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-10-22 12:25:58 -07:00
Ned Bass 79e7242a91 Add helper functions for manipulating device names
This change adds two helper functions for working with vdev names and paths.
zfs_resolve_shortname() resolves a shorthand vdev name to an absolute path
of a file in /dev, /dev/disk/by-id, /dev/disk/by-label, /dev/disk/by-path,
/dev/disk/by-uuid, /dev/disk/zpool.  This was previously done only in the
function is_shorthand_path(), but we need a general helper function to
implement shorthand names for additional zpool subcommands like remove.
is_shorthand_path() is accordingly updated to call the helper function.

There is a minor change in the way zfs_resolve_shortname() tests if a file
exists.  is_shorthand_path() effectively used open() and stat64() to test for
file existence, since its scope includes testing if a device is a whole disk
and collecting file status information.  zfs_resolve_shortname(), on the other
hand, only uses access() to test for existence and leaves it to the caller to
perform any additional file operations.  This seemed like the most general and
lightweight approach, and still preserves the semantics of is_shorthand_path().

zfs_append_partition() appends a partition suffix to a device path.  This
should be used to generate the name of a whole disk as it is stored in the vdev
label. The user-visible names of whole disks do not contain the partition
information, while the name in the vdev label does.   The code was lifted from
the function make_disks(), which now just calls the helper function.  Again,
having a helper function to do this supports general handling of shorthand
names in the user interface.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-10-22 12:25:30 -07:00
Brian Behlendorf 2959d94a0a Add FAILFAST support
ZFS works best when it is notified as soon as possible when a device
failure occurs.  This allows it to immediately start any recovery
actions which may be needed.  In theory Linux supports a flag which
can be set on bio's called FAILFAST which provides this quick
notification by disabling the retry logic in the lower scsi layers.

That's the theory at least.  In practice is turns out that while the
flag exists you oddly have to set it with the BIO_RW_AHEAD flag.
And even when it's set it you may get retries in the low level
drivers decides that's the right behavior, or if you don't get the
right error codes reported to the scsi midlayer.

Unfortunately, without additional kernels patchs there's not much
which can be done to improve this.  Basically, this just means that
it may take 2-3 minutes before a ZFS is notified properly that a
device has failed.  This can be improved and I suspect I'll be
submitting patches upstream to handle this.
2010-10-12 14:55:02 -07:00
Brian Behlendorf c5343ba71b Fix 'zpool events' formatting for awk
To make the 'zpool events' output simple to parse with awk the extra
newline after embedded nvlists has been dropped.  This allows the
entire event to be parsed as a single whitespace seperated record.

The -H option has been added to operate in scripted mode.  For the
'zpool events' command this means don't print the header.  The usage
of -H is consistent with scripted mode for other zpool commands.
2010-10-12 14:55:01 -07:00
Ned Bass 5c1bad0013 Fix undersized buffer in is_shorthand_path()
The string array 'char dirs[5][8]' was too small to accomodate the terminating
NUL character in "by-label". This change adds the needed additional byte.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-10-12 14:47:39 -07:00
Brian Behlendorf a5b4d63582 Add [-m map] option to zpool_layout
By default the zpool_layout command would always use the slot
number assigned by Linux when generating the zdev.conf file.
This is a reasonable default there are cases when it makes
sense to remap the slot id assigned by Linux using your own
custom mapping.

This commit adds support to zpool_layout to provide a custom
slot mapping file.  The file contains in the first column the
Linux slot it and in the second column the custom slot mapping.
By passing this map file with '-m map' to zpool_config the
mapping will be applied when generating zdev.conf.

Additionally, two sample mapping have been added which reflect
different ways to map the slots in the dragon drawers.
2010-09-17 11:02:19 -07:00