2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* CDDL HEADER START
|
|
|
|
*
|
|
|
|
* The contents of this file are subject to the terms of the
|
|
|
|
* Common Development and Distribution License (the "License").
|
|
|
|
* You may not use this file except in compliance with the License.
|
|
|
|
*
|
|
|
|
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
|
|
|
* or http://www.opensolaris.org/os/licensing.
|
|
|
|
* See the License for the specific language governing permissions
|
|
|
|
* and limitations under the License.
|
|
|
|
*
|
|
|
|
* When distributing Covered Code, include this CDDL HEADER in each
|
|
|
|
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
|
|
|
* If applicable, add the following below this CDDL HEADER, with the
|
|
|
|
* fields enclosed by brackets "[]" replaced with your own identifying
|
|
|
|
* information: Portions Copyright [yyyy] [name of copyright owner]
|
|
|
|
*
|
|
|
|
* CDDL HEADER END
|
|
|
|
*/
|
|
|
|
/*
|
2009-07-02 22:44:48 +00:00
|
|
|
* Copyright 2009 Sun Microsystems, Inc. All rights reserved.
|
2008-11-20 20:01:55 +00:00
|
|
|
* Use is subject to license terms.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/systm.h>
|
|
|
|
#include <sys/sysmacros.h>
|
|
|
|
#include <sys/kmem.h>
|
|
|
|
#include <sys/pathname.h>
|
|
|
|
#include <sys/vnode.h>
|
|
|
|
#include <sys/vfs.h>
|
|
|
|
#include <sys/vfs_opreg.h>
|
|
|
|
#include <sys/mntent.h>
|
|
|
|
#include <sys/mount.h>
|
|
|
|
#include <sys/cmn_err.h>
|
|
|
|
#include "fs/fs_subr.h"
|
|
|
|
#include <sys/zfs_znode.h>
|
|
|
|
#include <sys/zfs_dir.h>
|
|
|
|
#include <sys/zil.h>
|
|
|
|
#include <sys/fs/zfs.h>
|
|
|
|
#include <sys/dmu.h>
|
|
|
|
#include <sys/dsl_prop.h>
|
|
|
|
#include <sys/dsl_dataset.h>
|
|
|
|
#include <sys/dsl_deleg.h>
|
|
|
|
#include <sys/spa.h>
|
|
|
|
#include <sys/zap.h>
|
|
|
|
#include <sys/varargs.h>
|
|
|
|
#include <sys/policy.h>
|
|
|
|
#include <sys/atomic.h>
|
|
|
|
#include <sys/mkdev.h>
|
|
|
|
#include <sys/modctl.h>
|
|
|
|
#include <sys/refstr.h>
|
|
|
|
#include <sys/zfs_ioctl.h>
|
|
|
|
#include <sys/zfs_ctldir.h>
|
|
|
|
#include <sys/zfs_fuid.h>
|
|
|
|
#include <sys/bootconf.h>
|
|
|
|
#include <sys/sunddi.h>
|
|
|
|
#include <sys/dnlc.h>
|
|
|
|
#include <sys/dmu_objset.h>
|
|
|
|
#include <sys/spa_boot.h>
|
|
|
|
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#ifdef HAVE_ZPL
|
2008-11-20 20:01:55 +00:00
|
|
|
int zfsfstype;
|
|
|
|
vfsops_t *zfs_vfsops = NULL;
|
|
|
|
static major_t zfs_major;
|
|
|
|
static minor_t zfs_minor;
|
|
|
|
static kmutex_t zfs_dev_mtx;
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
extern int sys_shutdown;
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
static int zfs_mount(vfs_t *vfsp, vnode_t *mvp, struct mounta *uap, cred_t *cr);
|
|
|
|
static int zfs_umount(vfs_t *vfsp, int fflag, cred_t *cr);
|
|
|
|
static int zfs_mountroot(vfs_t *vfsp, enum whymountroot);
|
|
|
|
static int zfs_root(vfs_t *vfsp, vnode_t **vpp);
|
|
|
|
static int zfs_statvfs(vfs_t *vfsp, struct statvfs64 *statp);
|
|
|
|
static int zfs_vget(vfs_t *vfsp, vnode_t **vpp, fid_t *fidp);
|
|
|
|
static void zfs_freevfs(vfs_t *vfsp);
|
|
|
|
|
|
|
|
static const fs_operation_def_t zfs_vfsops_template[] = {
|
|
|
|
VFSNAME_MOUNT, { .vfs_mount = zfs_mount },
|
|
|
|
VFSNAME_MOUNTROOT, { .vfs_mountroot = zfs_mountroot },
|
|
|
|
VFSNAME_UNMOUNT, { .vfs_unmount = zfs_umount },
|
|
|
|
VFSNAME_ROOT, { .vfs_root = zfs_root },
|
|
|
|
VFSNAME_STATVFS, { .vfs_statvfs = zfs_statvfs },
|
|
|
|
VFSNAME_SYNC, { .vfs_sync = zfs_sync },
|
|
|
|
VFSNAME_VGET, { .vfs_vget = zfs_vget },
|
|
|
|
VFSNAME_FREEVFS, { .vfs_freevfs = zfs_freevfs },
|
|
|
|
NULL, NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
static const fs_operation_def_t zfs_vfsops_eio_template[] = {
|
|
|
|
VFSNAME_FREEVFS, { .vfs_freevfs = zfs_freevfs },
|
|
|
|
NULL, NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to keep a count of active fs's.
|
|
|
|
* This is necessary to prevent our module
|
|
|
|
* from being unloaded after a umount -f
|
|
|
|
*/
|
|
|
|
static uint32_t zfs_active_fs_count = 0;
|
|
|
|
|
|
|
|
static char *noatime_cancel[] = { MNTOPT_ATIME, NULL };
|
|
|
|
static char *atime_cancel[] = { MNTOPT_NOATIME, NULL };
|
|
|
|
static char *noxattr_cancel[] = { MNTOPT_XATTR, NULL };
|
|
|
|
static char *xattr_cancel[] = { MNTOPT_NOXATTR, NULL };
|
|
|
|
|
|
|
|
/*
|
|
|
|
* MO_DEFAULT is not used since the default value is determined
|
|
|
|
* by the equivalent property.
|
|
|
|
*/
|
|
|
|
static mntopt_t mntopts[] = {
|
|
|
|
{ MNTOPT_NOXATTR, noxattr_cancel, NULL, 0, NULL },
|
|
|
|
{ MNTOPT_XATTR, xattr_cancel, NULL, 0, NULL },
|
|
|
|
{ MNTOPT_NOATIME, noatime_cancel, NULL, 0, NULL },
|
|
|
|
{ MNTOPT_ATIME, atime_cancel, NULL, 0, NULL }
|
|
|
|
};
|
|
|
|
|
|
|
|
static mntopts_t zfs_mntopts = {
|
|
|
|
sizeof (mntopts) / sizeof (mntopt_t),
|
|
|
|
mntopts
|
|
|
|
};
|
|
|
|
|
|
|
|
/*ARGSUSED*/
|
|
|
|
int
|
|
|
|
zfs_sync(vfs_t *vfsp, short flag, cred_t *cr)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Data integrity is job one. We don't want a compromised kernel
|
|
|
|
* writing to the storage pool, so we never sync during panic.
|
|
|
|
*/
|
|
|
|
if (panicstr)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* SYNC_ATTR is used by fsflush() to force old filesystems like UFS
|
|
|
|
* to sync metadata, which they would otherwise cache indefinitely.
|
|
|
|
* Semantically, the only requirement is that the sync be initiated.
|
|
|
|
* The DMU syncs out txgs frequently, so there's nothing to do.
|
|
|
|
*/
|
|
|
|
if (flag & SYNC_ATTR)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
if (vfsp != NULL) {
|
|
|
|
/*
|
|
|
|
* Sync a specific filesystem.
|
|
|
|
*/
|
|
|
|
zfsvfs_t *zfsvfs = vfsp->vfs_data;
|
2009-07-02 22:44:48 +00:00
|
|
|
dsl_pool_t *dp;
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
2009-07-02 22:44:48 +00:00
|
|
|
dp = dmu_objset_pool(zfsvfs->z_os);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the system is shutting down, then skip any
|
|
|
|
* filesystems which may exist on a suspended pool.
|
|
|
|
*/
|
|
|
|
if (sys_shutdown && spa_suspended(dp->dp_spa)) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
if (zfsvfs->z_log != NULL)
|
|
|
|
zil_commit(zfsvfs->z_log, UINT64_MAX, 0);
|
|
|
|
else
|
2009-07-02 22:44:48 +00:00
|
|
|
txg_wait_synced(dp, 0);
|
2008-11-20 20:01:55 +00:00
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Sync all ZFS filesystems. This is what happens when you
|
|
|
|
* run sync(1M). Unlike other filesystems, ZFS honors the
|
|
|
|
* request by waiting for all pools to commit all dirty data.
|
|
|
|
*/
|
|
|
|
spa_sync_allpools();
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_create_unique_device(dev_t *dev)
|
|
|
|
{
|
|
|
|
major_t new_major;
|
|
|
|
|
|
|
|
do {
|
|
|
|
ASSERT3U(zfs_minor, <=, MAXMIN32);
|
|
|
|
minor_t start = zfs_minor;
|
|
|
|
do {
|
|
|
|
mutex_enter(&zfs_dev_mtx);
|
|
|
|
if (zfs_minor >= MAXMIN32) {
|
|
|
|
/*
|
|
|
|
* If we're still using the real major
|
|
|
|
* keep out of /dev/zfs and /dev/zvol minor
|
|
|
|
* number space. If we're using a getudev()'ed
|
|
|
|
* major number, we can use all of its minors.
|
|
|
|
*/
|
|
|
|
if (zfs_major == ddi_name_to_major(ZFS_DRIVER))
|
|
|
|
zfs_minor = ZFS_MIN_MINOR;
|
|
|
|
else
|
|
|
|
zfs_minor = 0;
|
|
|
|
} else {
|
|
|
|
zfs_minor++;
|
|
|
|
}
|
|
|
|
*dev = makedevice(zfs_major, zfs_minor);
|
|
|
|
mutex_exit(&zfs_dev_mtx);
|
|
|
|
} while (vfs_devismounted(*dev) && zfs_minor != start);
|
|
|
|
if (zfs_minor == start) {
|
|
|
|
/*
|
|
|
|
* We are using all ~262,000 minor numbers for the
|
|
|
|
* current major number. Create a new major number.
|
|
|
|
*/
|
|
|
|
if ((new_major = getudev()) == (major_t)-1) {
|
|
|
|
cmn_err(CE_WARN,
|
|
|
|
"zfs_mount: Can't get unique major "
|
|
|
|
"device number.");
|
|
|
|
return (-1);
|
|
|
|
}
|
|
|
|
mutex_enter(&zfs_dev_mtx);
|
|
|
|
zfs_major = new_major;
|
|
|
|
zfs_minor = 0;
|
|
|
|
|
|
|
|
mutex_exit(&zfs_dev_mtx);
|
|
|
|
} else {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
/* CONSTANTCONDITION */
|
|
|
|
} while (1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
atime_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
if (newval == TRUE) {
|
|
|
|
zfsvfs->z_atime = TRUE;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOATIME);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_ATIME, NULL, 0);
|
|
|
|
} else {
|
|
|
|
zfsvfs->z_atime = FALSE;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_ATIME);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOATIME, NULL, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
xattr_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
if (newval == TRUE) {
|
|
|
|
/* XXX locking on vfs_flag? */
|
|
|
|
zfsvfs->z_vfs->vfs_flag |= VFS_XATTR;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOXATTR);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_XATTR, NULL, 0);
|
|
|
|
} else {
|
|
|
|
/* XXX locking on vfs_flag? */
|
|
|
|
zfsvfs->z_vfs->vfs_flag &= ~VFS_XATTR;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_XATTR);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOXATTR, NULL, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
blksz_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
if (newval < SPA_MINBLOCKSIZE ||
|
|
|
|
newval > SPA_MAXBLOCKSIZE || !ISP2(newval))
|
|
|
|
newval = SPA_MAXBLOCKSIZE;
|
|
|
|
|
|
|
|
zfsvfs->z_max_blksz = newval;
|
|
|
|
zfsvfs->z_vfs->vfs_bsize = newval;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
readonly_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
if (newval) {
|
|
|
|
/* XXX locking on vfs_flag? */
|
|
|
|
zfsvfs->z_vfs->vfs_flag |= VFS_RDONLY;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_RW);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_RO, NULL, 0);
|
|
|
|
} else {
|
|
|
|
/* XXX locking on vfs_flag? */
|
|
|
|
zfsvfs->z_vfs->vfs_flag &= ~VFS_RDONLY;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_RO);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_RW, NULL, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
devices_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
if (newval == FALSE) {
|
|
|
|
zfsvfs->z_vfs->vfs_flag |= VFS_NODEVICES;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_DEVICES);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NODEVICES, NULL, 0);
|
|
|
|
} else {
|
|
|
|
zfsvfs->z_vfs->vfs_flag &= ~VFS_NODEVICES;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NODEVICES);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_DEVICES, NULL, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
setuid_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
if (newval == FALSE) {
|
|
|
|
zfsvfs->z_vfs->vfs_flag |= VFS_NOSETUID;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_SETUID);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOSETUID, NULL, 0);
|
|
|
|
} else {
|
|
|
|
zfsvfs->z_vfs->vfs_flag &= ~VFS_NOSETUID;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOSETUID);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_SETUID, NULL, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
exec_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
if (newval == FALSE) {
|
|
|
|
zfsvfs->z_vfs->vfs_flag |= VFS_NOEXEC;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_EXEC);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOEXEC, NULL, 0);
|
|
|
|
} else {
|
|
|
|
zfsvfs->z_vfs->vfs_flag &= ~VFS_NOEXEC;
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOEXEC);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_EXEC, NULL, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The nbmand mount option can be changed at mount time.
|
|
|
|
* We can't allow it to be toggled on live file systems or incorrect
|
|
|
|
* behavior may be seen from cifs clients
|
|
|
|
*
|
|
|
|
* This property isn't registered via dsl_prop_register(), but this callback
|
|
|
|
* will be called when a file system is first mounted
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
nbmand_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
if (newval == FALSE) {
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NBMAND);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NONBMAND, NULL, 0);
|
|
|
|
} else {
|
|
|
|
vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NONBMAND);
|
|
|
|
vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NBMAND, NULL, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
snapdir_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
zfsvfs->z_show_ctldir = newval;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
vscan_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
zfsvfs->z_vscan = newval;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
acl_mode_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
zfsvfs->z_acl_mode = newval;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
acl_inherit_changed_cb(void *arg, uint64_t newval)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = arg;
|
|
|
|
|
|
|
|
zfsvfs->z_acl_inherit = newval;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_register_callbacks(vfs_t *vfsp)
|
|
|
|
{
|
|
|
|
struct dsl_dataset *ds = NULL;
|
|
|
|
objset_t *os = NULL;
|
|
|
|
zfsvfs_t *zfsvfs = NULL;
|
|
|
|
uint64_t nbmand;
|
|
|
|
int readonly, do_readonly = B_FALSE;
|
|
|
|
int setuid, do_setuid = B_FALSE;
|
|
|
|
int exec, do_exec = B_FALSE;
|
|
|
|
int devices, do_devices = B_FALSE;
|
|
|
|
int xattr, do_xattr = B_FALSE;
|
|
|
|
int atime, do_atime = B_FALSE;
|
|
|
|
int error = 0;
|
|
|
|
|
|
|
|
ASSERT(vfsp);
|
|
|
|
zfsvfs = vfsp->vfs_data;
|
|
|
|
ASSERT(zfsvfs);
|
|
|
|
os = zfsvfs->z_os;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The act of registering our callbacks will destroy any mount
|
|
|
|
* options we may have. In order to enable temporary overrides
|
|
|
|
* of mount options, we stash away the current values and
|
|
|
|
* restore them after we register the callbacks.
|
|
|
|
*/
|
|
|
|
if (vfs_optionisset(vfsp, MNTOPT_RO, NULL)) {
|
|
|
|
readonly = B_TRUE;
|
|
|
|
do_readonly = B_TRUE;
|
|
|
|
} else if (vfs_optionisset(vfsp, MNTOPT_RW, NULL)) {
|
|
|
|
readonly = B_FALSE;
|
|
|
|
do_readonly = B_TRUE;
|
|
|
|
}
|
|
|
|
if (vfs_optionisset(vfsp, MNTOPT_NOSUID, NULL)) {
|
|
|
|
devices = B_FALSE;
|
|
|
|
setuid = B_FALSE;
|
|
|
|
do_devices = B_TRUE;
|
|
|
|
do_setuid = B_TRUE;
|
|
|
|
} else {
|
|
|
|
if (vfs_optionisset(vfsp, MNTOPT_NODEVICES, NULL)) {
|
|
|
|
devices = B_FALSE;
|
|
|
|
do_devices = B_TRUE;
|
|
|
|
} else if (vfs_optionisset(vfsp, MNTOPT_DEVICES, NULL)) {
|
|
|
|
devices = B_TRUE;
|
|
|
|
do_devices = B_TRUE;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (vfs_optionisset(vfsp, MNTOPT_NOSETUID, NULL)) {
|
|
|
|
setuid = B_FALSE;
|
|
|
|
do_setuid = B_TRUE;
|
|
|
|
} else if (vfs_optionisset(vfsp, MNTOPT_SETUID, NULL)) {
|
|
|
|
setuid = B_TRUE;
|
|
|
|
do_setuid = B_TRUE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (vfs_optionisset(vfsp, MNTOPT_NOEXEC, NULL)) {
|
|
|
|
exec = B_FALSE;
|
|
|
|
do_exec = B_TRUE;
|
|
|
|
} else if (vfs_optionisset(vfsp, MNTOPT_EXEC, NULL)) {
|
|
|
|
exec = B_TRUE;
|
|
|
|
do_exec = B_TRUE;
|
|
|
|
}
|
|
|
|
if (vfs_optionisset(vfsp, MNTOPT_NOXATTR, NULL)) {
|
|
|
|
xattr = B_FALSE;
|
|
|
|
do_xattr = B_TRUE;
|
|
|
|
} else if (vfs_optionisset(vfsp, MNTOPT_XATTR, NULL)) {
|
|
|
|
xattr = B_TRUE;
|
|
|
|
do_xattr = B_TRUE;
|
|
|
|
}
|
|
|
|
if (vfs_optionisset(vfsp, MNTOPT_NOATIME, NULL)) {
|
|
|
|
atime = B_FALSE;
|
|
|
|
do_atime = B_TRUE;
|
|
|
|
} else if (vfs_optionisset(vfsp, MNTOPT_ATIME, NULL)) {
|
|
|
|
atime = B_TRUE;
|
|
|
|
do_atime = B_TRUE;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* nbmand is a special property. It can only be changed at
|
|
|
|
* mount time.
|
|
|
|
*
|
|
|
|
* This is weird, but it is documented to only be changeable
|
|
|
|
* at mount time.
|
|
|
|
*/
|
|
|
|
if (vfs_optionisset(vfsp, MNTOPT_NONBMAND, NULL)) {
|
|
|
|
nbmand = B_FALSE;
|
|
|
|
} else if (vfs_optionisset(vfsp, MNTOPT_NBMAND, NULL)) {
|
|
|
|
nbmand = B_TRUE;
|
|
|
|
} else {
|
|
|
|
char osname[MAXNAMELEN];
|
|
|
|
|
|
|
|
dmu_objset_name(os, osname);
|
|
|
|
if (error = dsl_prop_get_integer(osname, "nbmand", &nbmand,
|
2008-12-03 20:09:06 +00:00
|
|
|
NULL)) {
|
|
|
|
return (error);
|
|
|
|
}
|
2008-11-20 20:01:55 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Register property callbacks.
|
|
|
|
*
|
|
|
|
* It would probably be fine to just check for i/o error from
|
|
|
|
* the first prop_register(), but I guess I like to go
|
|
|
|
* overboard...
|
|
|
|
*/
|
|
|
|
ds = dmu_objset_ds(os);
|
|
|
|
error = dsl_prop_register(ds, "atime", atime_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"xattr", xattr_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"recordsize", blksz_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"readonly", readonly_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"devices", devices_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"setuid", setuid_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"exec", exec_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"snapdir", snapdir_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"aclmode", acl_mode_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"aclinherit", acl_inherit_changed_cb, zfsvfs);
|
|
|
|
error = error ? error : dsl_prop_register(ds,
|
|
|
|
"vscan", vscan_changed_cb, zfsvfs);
|
|
|
|
if (error)
|
|
|
|
goto unregister;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Invoke our callbacks to restore temporary mount options.
|
|
|
|
*/
|
|
|
|
if (do_readonly)
|
|
|
|
readonly_changed_cb(zfsvfs, readonly);
|
|
|
|
if (do_setuid)
|
|
|
|
setuid_changed_cb(zfsvfs, setuid);
|
|
|
|
if (do_exec)
|
|
|
|
exec_changed_cb(zfsvfs, exec);
|
|
|
|
if (do_devices)
|
|
|
|
devices_changed_cb(zfsvfs, devices);
|
|
|
|
if (do_xattr)
|
|
|
|
xattr_changed_cb(zfsvfs, xattr);
|
|
|
|
if (do_atime)
|
|
|
|
atime_changed_cb(zfsvfs, atime);
|
|
|
|
|
|
|
|
nbmand_changed_cb(zfsvfs, nbmand);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
unregister:
|
|
|
|
/*
|
|
|
|
* We may attempt to unregister some callbacks that are not
|
|
|
|
* registered, but this is OK; it will simply return ENOMSG,
|
|
|
|
* which we will ignore.
|
|
|
|
*/
|
|
|
|
(void) dsl_prop_unregister(ds, "atime", atime_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "xattr", xattr_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "recordsize", blksz_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "readonly", readonly_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "devices", devices_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "setuid", setuid_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "exec", exec_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "snapdir", snapdir_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "aclmode", acl_mode_changed_cb, zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "aclinherit", acl_inherit_changed_cb,
|
|
|
|
zfsvfs);
|
|
|
|
(void) dsl_prop_unregister(ds, "vscan", vscan_changed_cb, zfsvfs);
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
static void
|
|
|
|
uidacct(objset_t *os, boolean_t isgroup, uint64_t fuid,
|
|
|
|
int64_t delta, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
uint64_t used = 0;
|
|
|
|
char buf[32];
|
|
|
|
int err;
|
|
|
|
uint64_t obj = isgroup ? DMU_GROUPUSED_OBJECT : DMU_USERUSED_OBJECT;
|
|
|
|
|
|
|
|
if (delta == 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
(void) snprintf(buf, sizeof (buf), "%llx", (longlong_t)fuid);
|
|
|
|
err = zap_lookup(os, obj, buf, 8, 1, &used);
|
|
|
|
ASSERT(err == 0 || err == ENOENT);
|
|
|
|
/* no underflow/overflow */
|
|
|
|
ASSERT(delta > 0 || used >= -delta);
|
|
|
|
ASSERT(delta < 0 || used + delta > used);
|
|
|
|
used += delta;
|
|
|
|
if (used == 0)
|
|
|
|
err = zap_remove(os, obj, buf, tx);
|
|
|
|
else
|
|
|
|
err = zap_update(os, obj, buf, 8, 1, &used, tx);
|
|
|
|
ASSERT(err == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_space_delta_cb(objset_t *os, dmu_object_type_t bonustype,
|
|
|
|
void *oldbonus, void *newbonus,
|
|
|
|
uint64_t oldused, uint64_t newused, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
znode_phys_t *oldznp = oldbonus;
|
|
|
|
znode_phys_t *newznp = newbonus;
|
|
|
|
|
|
|
|
if (bonustype != DMU_OT_ZNODE)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* We charge 512 for the dnode (if it's allocated). */
|
|
|
|
if (oldznp->zp_gen != 0)
|
|
|
|
oldused += DNODE_SIZE;
|
|
|
|
if (newznp->zp_gen != 0)
|
|
|
|
newused += DNODE_SIZE;
|
|
|
|
|
|
|
|
if (oldznp->zp_uid == newznp->zp_uid) {
|
|
|
|
uidacct(os, B_FALSE, oldznp->zp_uid, newused-oldused, tx);
|
|
|
|
} else {
|
|
|
|
uidacct(os, B_FALSE, oldznp->zp_uid, -oldused, tx);
|
|
|
|
uidacct(os, B_FALSE, newznp->zp_uid, newused, tx);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (oldznp->zp_gid == newznp->zp_gid) {
|
|
|
|
uidacct(os, B_TRUE, oldznp->zp_gid, newused-oldused, tx);
|
|
|
|
} else {
|
|
|
|
uidacct(os, B_TRUE, oldznp->zp_gid, -oldused, tx);
|
|
|
|
uidacct(os, B_TRUE, newznp->zp_gid, newused, tx);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
fuidstr_to_sid(zfsvfs_t *zfsvfs, const char *fuidstr,
|
|
|
|
char *domainbuf, int buflen, uid_t *ridp)
|
|
|
|
{
|
|
|
|
extern uint64_t strtonum(const char *str, char **nptr);
|
|
|
|
uint64_t fuid;
|
|
|
|
const char *domain;
|
|
|
|
|
|
|
|
fuid = strtonum(fuidstr, NULL);
|
|
|
|
|
|
|
|
domain = zfs_fuid_find_by_idx(zfsvfs, FUID_INDEX(fuid));
|
|
|
|
if (domain)
|
|
|
|
(void) strlcpy(domainbuf, domain, buflen);
|
|
|
|
else
|
|
|
|
domainbuf[0] = '\0';
|
|
|
|
*ridp = FUID_RID(fuid);
|
|
|
|
}
|
|
|
|
|
|
|
|
static uint64_t
|
|
|
|
zfs_userquota_prop_to_obj(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type)
|
|
|
|
{
|
|
|
|
switch (type) {
|
|
|
|
case ZFS_PROP_USERUSED:
|
|
|
|
return (DMU_USERUSED_OBJECT);
|
|
|
|
case ZFS_PROP_GROUPUSED:
|
|
|
|
return (DMU_GROUPUSED_OBJECT);
|
|
|
|
case ZFS_PROP_USERQUOTA:
|
|
|
|
return (zfsvfs->z_userquota_obj);
|
|
|
|
case ZFS_PROP_GROUPQUOTA:
|
|
|
|
return (zfsvfs->z_groupquota_obj);
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
|
|
|
|
uint64_t *cookiep, void *vbuf, uint64_t *bufsizep)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
zap_cursor_t zc;
|
|
|
|
zap_attribute_t za;
|
|
|
|
zfs_useracct_t *buf = vbuf;
|
|
|
|
uint64_t obj;
|
|
|
|
|
|
|
|
if (!dmu_objset_userspace_present(zfsvfs->z_os))
|
|
|
|
return (ENOTSUP);
|
|
|
|
|
|
|
|
obj = zfs_userquota_prop_to_obj(zfsvfs, type);
|
|
|
|
if (obj == 0) {
|
|
|
|
*bufsizep = 0;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (zap_cursor_init_serialized(&zc, zfsvfs->z_os, obj, *cookiep);
|
|
|
|
(error = zap_cursor_retrieve(&zc, &za)) == 0;
|
|
|
|
zap_cursor_advance(&zc)) {
|
|
|
|
if ((uintptr_t)buf - (uintptr_t)vbuf + sizeof (zfs_useracct_t) >
|
|
|
|
*bufsizep)
|
|
|
|
break;
|
|
|
|
|
|
|
|
fuidstr_to_sid(zfsvfs, za.za_name,
|
|
|
|
buf->zu_domain, sizeof (buf->zu_domain), &buf->zu_rid);
|
|
|
|
|
|
|
|
buf->zu_space = za.za_first_integer;
|
|
|
|
buf++;
|
|
|
|
}
|
|
|
|
if (error == ENOENT)
|
|
|
|
error = 0;
|
|
|
|
|
|
|
|
ASSERT3U((uintptr_t)buf - (uintptr_t)vbuf, <=, *bufsizep);
|
|
|
|
*bufsizep = (uintptr_t)buf - (uintptr_t)vbuf;
|
|
|
|
*cookiep = zap_cursor_serialize(&zc);
|
|
|
|
zap_cursor_fini(&zc);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* buf must be big enough (eg, 32 bytes)
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
id_to_fuidstr(zfsvfs_t *zfsvfs, const char *domain, uid_t rid,
|
|
|
|
char *buf, boolean_t addok)
|
|
|
|
{
|
|
|
|
uint64_t fuid;
|
|
|
|
int domainid = 0;
|
|
|
|
|
|
|
|
if (domain && domain[0]) {
|
|
|
|
domainid = zfs_fuid_find_by_domain(zfsvfs, domain, NULL, addok);
|
|
|
|
if (domainid == -1)
|
|
|
|
return (ENOENT);
|
|
|
|
}
|
|
|
|
fuid = FUID_ENCODE(domainid, rid);
|
|
|
|
(void) sprintf(buf, "%llx", (longlong_t)fuid);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
|
|
|
|
const char *domain, uint64_t rid, uint64_t *valp)
|
|
|
|
{
|
|
|
|
char buf[32];
|
|
|
|
int err;
|
|
|
|
uint64_t obj;
|
|
|
|
|
|
|
|
*valp = 0;
|
|
|
|
|
|
|
|
if (!dmu_objset_userspace_present(zfsvfs->z_os))
|
|
|
|
return (ENOTSUP);
|
|
|
|
|
|
|
|
obj = zfs_userquota_prop_to_obj(zfsvfs, type);
|
|
|
|
if (obj == 0)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
err = id_to_fuidstr(zfsvfs, domain, rid, buf, B_FALSE);
|
|
|
|
if (err)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
err = zap_lookup(zfsvfs->z_os, obj, buf, 8, 1, valp);
|
|
|
|
if (err == ENOENT)
|
|
|
|
err = 0;
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
|
|
|
|
const char *domain, uint64_t rid, uint64_t quota)
|
|
|
|
{
|
|
|
|
char buf[32];
|
|
|
|
int err;
|
|
|
|
dmu_tx_t *tx;
|
|
|
|
uint64_t *objp;
|
|
|
|
boolean_t fuid_dirtied;
|
|
|
|
|
|
|
|
if (type != ZFS_PROP_USERQUOTA && type != ZFS_PROP_GROUPQUOTA)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
if (zfsvfs->z_version < ZPL_VERSION_USERSPACE)
|
|
|
|
return (ENOTSUP);
|
|
|
|
|
|
|
|
objp = (type == ZFS_PROP_USERQUOTA) ? &zfsvfs->z_userquota_obj :
|
|
|
|
&zfsvfs->z_groupquota_obj;
|
|
|
|
|
|
|
|
err = id_to_fuidstr(zfsvfs, domain, rid, buf, B_TRUE);
|
|
|
|
if (err)
|
|
|
|
return (err);
|
|
|
|
fuid_dirtied = zfsvfs->z_fuid_dirty;
|
|
|
|
|
|
|
|
tx = dmu_tx_create(zfsvfs->z_os);
|
|
|
|
dmu_tx_hold_zap(tx, *objp ? *objp : DMU_NEW_OBJECT, B_TRUE, NULL);
|
|
|
|
if (*objp == 0) {
|
|
|
|
dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_TRUE,
|
|
|
|
zfs_userquota_prop_prefixes[type]);
|
|
|
|
}
|
|
|
|
if (fuid_dirtied)
|
|
|
|
zfs_fuid_txhold(zfsvfs, tx);
|
|
|
|
err = dmu_tx_assign(tx, TXG_WAIT);
|
|
|
|
if (err) {
|
|
|
|
dmu_tx_abort(tx);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_enter(&zfsvfs->z_lock);
|
|
|
|
if (*objp == 0) {
|
|
|
|
*objp = zap_create(zfsvfs->z_os, DMU_OT_USERGROUP_QUOTA,
|
|
|
|
DMU_OT_NONE, 0, tx);
|
|
|
|
VERIFY(0 == zap_add(zfsvfs->z_os, MASTER_NODE_OBJ,
|
|
|
|
zfs_userquota_prop_prefixes[type], 8, 1, objp, tx));
|
|
|
|
}
|
|
|
|
mutex_exit(&zfsvfs->z_lock);
|
|
|
|
|
|
|
|
if (quota == 0) {
|
|
|
|
err = zap_remove(zfsvfs->z_os, *objp, buf, tx);
|
|
|
|
if (err == ENOENT)
|
|
|
|
err = 0;
|
|
|
|
} else {
|
|
|
|
err = zap_update(zfsvfs->z_os, *objp, buf, 8, 1, "a, tx);
|
|
|
|
}
|
|
|
|
ASSERT(err == 0);
|
|
|
|
if (fuid_dirtied)
|
|
|
|
zfs_fuid_sync(zfsvfs, tx);
|
|
|
|
dmu_tx_commit(tx);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
boolean_t
|
|
|
|
zfs_usergroup_overquota(zfsvfs_t *zfsvfs, boolean_t isgroup, uint64_t fuid)
|
|
|
|
{
|
|
|
|
char buf[32];
|
|
|
|
uint64_t used, quota, usedobj, quotaobj;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
usedobj = isgroup ? DMU_GROUPUSED_OBJECT : DMU_USERUSED_OBJECT;
|
|
|
|
quotaobj = isgroup ? zfsvfs->z_groupquota_obj : zfsvfs->z_userquota_obj;
|
|
|
|
|
|
|
|
if (quotaobj == 0 || zfsvfs->z_replay)
|
|
|
|
return (B_FALSE);
|
|
|
|
|
|
|
|
(void) sprintf(buf, "%llx", (longlong_t)fuid);
|
|
|
|
err = zap_lookup(zfsvfs->z_os, quotaobj, buf, 8, 1, "a);
|
|
|
|
if (err != 0)
|
|
|
|
return (B_FALSE);
|
|
|
|
|
|
|
|
err = zap_lookup(zfsvfs->z_os, usedobj, buf, 8, 1, &used);
|
|
|
|
if (err != 0)
|
|
|
|
return (B_FALSE);
|
|
|
|
return (used >= quota);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
zfsvfs_create(const char *osname, int mode, zfsvfs_t **zvp)
|
|
|
|
{
|
|
|
|
objset_t *os;
|
|
|
|
zfsvfs_t *zfsvfs;
|
|
|
|
uint64_t zval;
|
|
|
|
int i, error;
|
|
|
|
|
|
|
|
if (error = dsl_prop_get_integer(osname, "readonly", &zval, NULL))
|
|
|
|
return (error);
|
|
|
|
if (zval)
|
|
|
|
mode |= DS_MODE_READONLY;
|
|
|
|
|
|
|
|
error = dmu_objset_open(osname, DMU_OST_ZFS, mode, &os);
|
|
|
|
if (error == EROFS) {
|
|
|
|
mode |= DS_MODE_READONLY;
|
|
|
|
error = dmu_objset_open(osname, DMU_OST_ZFS, mode, &os);
|
|
|
|
}
|
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize the zfs-specific filesystem structure.
|
|
|
|
* Should probably make this a kmem cache, shuffle fields,
|
|
|
|
* and just bzero up to z_hold_mtx[].
|
|
|
|
*/
|
|
|
|
zfsvfs = kmem_zalloc(sizeof (zfsvfs_t), KM_SLEEP);
|
|
|
|
zfsvfs->z_vfs = NULL;
|
|
|
|
zfsvfs->z_parent = zfsvfs;
|
|
|
|
zfsvfs->z_max_blksz = SPA_MAXBLOCKSIZE;
|
|
|
|
zfsvfs->z_show_ctldir = ZFS_SNAPDIR_VISIBLE;
|
|
|
|
zfsvfs->z_os = os;
|
|
|
|
|
|
|
|
error = zfs_get_zplprop(os, ZFS_PROP_VERSION, &zfsvfs->z_version);
|
|
|
|
if (error) {
|
|
|
|
goto out;
|
|
|
|
} else if (zfsvfs->z_version > ZPL_VERSION) {
|
|
|
|
(void) printf("Mismatched versions: File system "
|
|
|
|
"is version %llu on-disk format, which is "
|
|
|
|
"incompatible with this software version %lld!",
|
|
|
|
(u_longlong_t)zfsvfs->z_version, ZPL_VERSION);
|
|
|
|
error = ENOTSUP;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((error = zfs_get_zplprop(os, ZFS_PROP_NORMALIZE, &zval)) != 0)
|
|
|
|
goto out;
|
|
|
|
zfsvfs->z_norm = (int)zval;
|
|
|
|
|
|
|
|
if ((error = zfs_get_zplprop(os, ZFS_PROP_UTF8ONLY, &zval)) != 0)
|
|
|
|
goto out;
|
|
|
|
zfsvfs->z_utf8 = (zval != 0);
|
|
|
|
|
|
|
|
if ((error = zfs_get_zplprop(os, ZFS_PROP_CASE, &zval)) != 0)
|
|
|
|
goto out;
|
|
|
|
zfsvfs->z_case = (uint_t)zval;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Fold case on file systems that are always or sometimes case
|
|
|
|
* insensitive.
|
|
|
|
*/
|
|
|
|
if (zfsvfs->z_case == ZFS_CASE_INSENSITIVE ||
|
|
|
|
zfsvfs->z_case == ZFS_CASE_MIXED)
|
|
|
|
zfsvfs->z_norm |= U8_TEXTPREP_TOUPPER;
|
|
|
|
|
|
|
|
zfsvfs->z_use_fuids = USE_FUIDS(zfsvfs->z_version, zfsvfs->z_os);
|
|
|
|
|
|
|
|
error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_ROOT_OBJ, 8, 1,
|
|
|
|
&zfsvfs->z_root);
|
|
|
|
if (error)
|
|
|
|
goto out;
|
|
|
|
ASSERT(zfsvfs->z_root != 0);
|
|
|
|
|
|
|
|
error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_UNLINKED_SET, 8, 1,
|
|
|
|
&zfsvfs->z_unlinkedobj);
|
|
|
|
if (error)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
error = zap_lookup(os, MASTER_NODE_OBJ,
|
|
|
|
zfs_userquota_prop_prefixes[ZFS_PROP_USERQUOTA],
|
|
|
|
8, 1, &zfsvfs->z_userquota_obj);
|
|
|
|
if (error && error != ENOENT)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
error = zap_lookup(os, MASTER_NODE_OBJ,
|
|
|
|
zfs_userquota_prop_prefixes[ZFS_PROP_GROUPQUOTA],
|
|
|
|
8, 1, &zfsvfs->z_groupquota_obj);
|
|
|
|
if (error && error != ENOENT)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES, 8, 1,
|
|
|
|
&zfsvfs->z_fuid_obj);
|
|
|
|
if (error && error != ENOENT)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_SHARES_DIR, 8, 1,
|
|
|
|
&zfsvfs->z_shares_dir);
|
|
|
|
if (error && error != ENOENT)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
mutex_init(&zfsvfs->z_znodes_lock, NULL, MUTEX_DEFAULT, NULL);
|
|
|
|
mutex_init(&zfsvfs->z_lock, NULL, MUTEX_DEFAULT, NULL);
|
|
|
|
list_create(&zfsvfs->z_all_znodes, sizeof (znode_t),
|
|
|
|
offsetof(znode_t, z_link_node));
|
|
|
|
rrw_init(&zfsvfs->z_teardown_lock);
|
|
|
|
rw_init(&zfsvfs->z_teardown_inactive_lock, NULL, RW_DEFAULT, NULL);
|
|
|
|
rw_init(&zfsvfs->z_fuid_lock, NULL, RW_DEFAULT, NULL);
|
|
|
|
for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
|
|
|
|
mutex_init(&zfsvfs->z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL);
|
|
|
|
|
|
|
|
*zvp = zfsvfs;
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
out:
|
|
|
|
dmu_objset_close(os);
|
|
|
|
*zvp = NULL;
|
|
|
|
kmem_free(zfsvfs, sizeof (zfsvfs_t));
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
static int
|
|
|
|
zfsvfs_setup(zfsvfs_t *zfsvfs, boolean_t mounting)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = zfs_register_callbacks(zfsvfs->z_vfs);
|
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Set the objset user_ptr to track its zfsvfs.
|
|
|
|
*/
|
|
|
|
mutex_enter(&zfsvfs->z_os->os->os_user_ptr_lock);
|
|
|
|
dmu_objset_set_user(zfsvfs->z_os, zfsvfs);
|
|
|
|
mutex_exit(&zfsvfs->z_os->os->os_user_ptr_lock);
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
zfsvfs->z_log = zil_open(zfsvfs->z_os, zfs_get_data);
|
|
|
|
if (zil_disable) {
|
|
|
|
zil_destroy(zfsvfs->z_log, 0);
|
|
|
|
zfsvfs->z_log = NULL;
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* If we are not mounting (ie: online recv), then we don't
|
|
|
|
* have to worry about replaying the log as we blocked all
|
|
|
|
* operations out since we closed the ZIL.
|
|
|
|
*/
|
|
|
|
if (mounting) {
|
2008-12-03 20:09:06 +00:00
|
|
|
boolean_t readonly;
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* During replay we remove the read only flag to
|
|
|
|
* allow replays to succeed.
|
|
|
|
*/
|
|
|
|
readonly = zfsvfs->z_vfs->vfs_flag & VFS_RDONLY;
|
2009-01-15 21:59:39 +00:00
|
|
|
if (readonly != 0)
|
|
|
|
zfsvfs->z_vfs->vfs_flag &= ~VFS_RDONLY;
|
|
|
|
else
|
|
|
|
zfs_unlinked_drain(zfsvfs);
|
2008-11-20 20:01:55 +00:00
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
if (zfsvfs->z_log) {
|
2009-01-15 21:59:39 +00:00
|
|
|
/*
|
|
|
|
* Parse and replay the intent log.
|
|
|
|
*
|
|
|
|
* Because of ziltest, this must be done after
|
|
|
|
* zfs_unlinked_drain(). (Further note: ziltest
|
|
|
|
* doesn't use readonly mounts, where
|
|
|
|
* zfs_unlinked_drain() isn't called.) This is because
|
|
|
|
* ziltest causes spa_sync() to think it's committed,
|
|
|
|
* but actually it is not, so the intent log contains
|
|
|
|
* many txg's worth of changes.
|
|
|
|
*
|
|
|
|
* In particular, if object N is in the unlinked set in
|
|
|
|
* the last txg to actually sync, then it could be
|
|
|
|
* actually freed in a later txg and then reallocated
|
|
|
|
* in a yet later txg. This would write a "create
|
|
|
|
* object N" record to the intent log. Normally, this
|
|
|
|
* would be fine because the spa_sync() would have
|
|
|
|
* written out the fact that object N is free, before
|
|
|
|
* we could write the "create object N" intent log
|
|
|
|
* record.
|
|
|
|
*
|
|
|
|
* But when we are in ziltest mode, we advance the "open
|
|
|
|
* txg" without actually spa_sync()-ing the changes to
|
|
|
|
* disk. So we would see that object N is still
|
|
|
|
* allocated and in the unlinked set, and there is an
|
|
|
|
* intent log record saying to allocate it.
|
|
|
|
*/
|
|
|
|
zfsvfs->z_replay = B_TRUE;
|
|
|
|
zil_replay(zfsvfs->z_os, zfsvfs, zfs_replay_vector);
|
|
|
|
zfsvfs->z_replay = B_FALSE;
|
|
|
|
}
|
2008-11-20 20:01:55 +00:00
|
|
|
zfsvfs->z_vfs->vfs_flag |= readonly; /* restore readonly bit */
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
void
|
|
|
|
zfsvfs_free(zfsvfs_t *zfsvfs)
|
2008-11-20 20:01:55 +00:00
|
|
|
{
|
2009-07-02 22:44:48 +00:00
|
|
|
int i;
|
|
|
|
extern krwlock_t zfsvfs_lock; /* in zfs_znode.c */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is a barrier to prevent the filesystem from going away in
|
|
|
|
* zfs_znode_move() until we can safely ensure that the filesystem is
|
|
|
|
* not unmounted. We consider the filesystem valid before the barrier
|
|
|
|
* and invalid after the barrier.
|
|
|
|
*/
|
|
|
|
rw_enter(&zfsvfs_lock, RW_READER);
|
|
|
|
rw_exit(&zfsvfs_lock);
|
|
|
|
|
|
|
|
zfs_fuid_destroy(zfsvfs);
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
mutex_destroy(&zfsvfs->z_znodes_lock);
|
2009-07-02 22:44:48 +00:00
|
|
|
mutex_destroy(&zfsvfs->z_lock);
|
2008-11-20 20:01:55 +00:00
|
|
|
list_destroy(&zfsvfs->z_all_znodes);
|
|
|
|
rrw_destroy(&zfsvfs->z_teardown_lock);
|
|
|
|
rw_destroy(&zfsvfs->z_teardown_inactive_lock);
|
|
|
|
rw_destroy(&zfsvfs->z_fuid_lock);
|
2009-07-02 22:44:48 +00:00
|
|
|
for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
|
|
|
|
mutex_destroy(&zfsvfs->z_hold_mtx[i]);
|
2008-11-20 20:01:55 +00:00
|
|
|
kmem_free(zfsvfs, sizeof (zfsvfs_t));
|
|
|
|
}
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
static void
|
|
|
|
zfs_set_fuid_feature(zfsvfs_t *zfsvfs)
|
|
|
|
{
|
|
|
|
zfsvfs->z_use_fuids = USE_FUIDS(zfsvfs->z_version, zfsvfs->z_os);
|
|
|
|
if (zfsvfs->z_use_fuids && zfsvfs->z_vfs) {
|
|
|
|
vfs_set_feature(zfsvfs->z_vfs, VFSFT_XVATTR);
|
|
|
|
vfs_set_feature(zfsvfs->z_vfs, VFSFT_SYSATTR_VIEWS);
|
|
|
|
vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACEMASKONACCESS);
|
|
|
|
vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACLONCREATE);
|
|
|
|
vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACCESS_FILTER);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
static int
|
2008-12-03 20:09:06 +00:00
|
|
|
zfs_domount(vfs_t *vfsp, char *osname)
|
2008-11-20 20:01:55 +00:00
|
|
|
{
|
|
|
|
dev_t mount_dev;
|
2009-07-02 22:44:48 +00:00
|
|
|
uint64_t recordsize, fsid_guid;
|
2008-11-20 20:01:55 +00:00
|
|
|
int error = 0;
|
|
|
|
zfsvfs_t *zfsvfs;
|
|
|
|
|
|
|
|
ASSERT(vfsp);
|
|
|
|
ASSERT(osname);
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
error = zfsvfs_create(osname, DS_MODE_OWNER, &zfsvfs);
|
|
|
|
if (error)
|
|
|
|
return (error);
|
2008-11-20 20:01:55 +00:00
|
|
|
zfsvfs->z_vfs = vfsp;
|
|
|
|
|
|
|
|
/* Initialize the generic filesystem structure. */
|
|
|
|
vfsp->vfs_bcount = 0;
|
|
|
|
vfsp->vfs_data = NULL;
|
|
|
|
|
|
|
|
if (zfs_create_unique_device(&mount_dev) == -1) {
|
|
|
|
error = ENODEV;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
ASSERT(vfs_devismounted(mount_dev) == 0);
|
|
|
|
|
|
|
|
if (error = dsl_prop_get_integer(osname, "recordsize", &recordsize,
|
|
|
|
NULL))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
vfsp->vfs_dev = mount_dev;
|
|
|
|
vfsp->vfs_fstype = zfsfstype;
|
|
|
|
vfsp->vfs_bsize = recordsize;
|
|
|
|
vfsp->vfs_flag |= VFS_NOTRUNC;
|
|
|
|
vfsp->vfs_data = zfsvfs;
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
/*
|
|
|
|
* The fsid is 64 bits, composed of an 8-bit fs type, which
|
|
|
|
* separates our fsid from any other filesystem types, and a
|
|
|
|
* 56-bit objset unique ID. The objset unique ID is unique to
|
|
|
|
* all objsets open on this system, provided by unique_create().
|
|
|
|
* The 8-bit fs type must be put in the low bits of fsid[1]
|
|
|
|
* because that's where other Solaris filesystems put it.
|
|
|
|
*/
|
|
|
|
fsid_guid = dmu_objset_fsid_guid(zfsvfs->z_os);
|
|
|
|
ASSERT((fsid_guid & ~((1ULL<<56)-1)) == 0);
|
|
|
|
vfsp->vfs_fsid.val[0] = fsid_guid;
|
|
|
|
vfsp->vfs_fsid.val[1] = ((fsid_guid>>32) << 8) |
|
|
|
|
zfsfstype & 0xFF;
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set features for file system.
|
|
|
|
*/
|
2009-07-02 22:44:48 +00:00
|
|
|
zfs_set_fuid_feature(zfsvfs);
|
2008-11-20 20:01:55 +00:00
|
|
|
if (zfsvfs->z_case == ZFS_CASE_INSENSITIVE) {
|
|
|
|
vfs_set_feature(vfsp, VFSFT_DIRENTFLAGS);
|
|
|
|
vfs_set_feature(vfsp, VFSFT_CASEINSENSITIVE);
|
|
|
|
vfs_set_feature(vfsp, VFSFT_NOCASESENSITIVE);
|
|
|
|
} else if (zfsvfs->z_case == ZFS_CASE_MIXED) {
|
|
|
|
vfs_set_feature(vfsp, VFSFT_DIRENTFLAGS);
|
|
|
|
vfs_set_feature(vfsp, VFSFT_CASEINSENSITIVE);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dmu_objset_is_snapshot(zfsvfs->z_os)) {
|
|
|
|
uint64_t pval;
|
|
|
|
|
|
|
|
atime_changed_cb(zfsvfs, B_FALSE);
|
|
|
|
readonly_changed_cb(zfsvfs, B_TRUE);
|
|
|
|
if (error = dsl_prop_get_integer(osname, "xattr", &pval, NULL))
|
|
|
|
goto out;
|
|
|
|
xattr_changed_cb(zfsvfs, pval);
|
|
|
|
zfsvfs->z_issnap = B_TRUE;
|
2009-07-02 22:44:48 +00:00
|
|
|
|
|
|
|
mutex_enter(&zfsvfs->z_os->os->os_user_ptr_lock);
|
|
|
|
dmu_objset_set_user(zfsvfs->z_os, zfsvfs);
|
|
|
|
mutex_exit(&zfsvfs->z_os->os->os_user_ptr_lock);
|
2008-11-20 20:01:55 +00:00
|
|
|
} else {
|
|
|
|
error = zfsvfs_setup(zfsvfs, B_TRUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!zfsvfs->z_issnap)
|
|
|
|
zfsctl_create(zfsvfs);
|
|
|
|
out:
|
|
|
|
if (error) {
|
2009-07-02 22:44:48 +00:00
|
|
|
dmu_objset_close(zfsvfs->z_os);
|
|
|
|
zfsvfs_free(zfsvfs);
|
2008-11-20 20:01:55 +00:00
|
|
|
} else {
|
|
|
|
atomic_add_32(&zfs_active_fs_count, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
zfs_unregister_callbacks(zfsvfs_t *zfsvfs)
|
|
|
|
{
|
|
|
|
objset_t *os = zfsvfs->z_os;
|
|
|
|
struct dsl_dataset *ds;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unregister properties.
|
|
|
|
*/
|
|
|
|
if (!dmu_objset_is_snapshot(os)) {
|
|
|
|
ds = dmu_objset_ds(os);
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "atime", atime_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "xattr", xattr_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "recordsize", blksz_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "readonly", readonly_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "devices", devices_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "setuid", setuid_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "exec", exec_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "snapdir", snapdir_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "aclmode", acl_mode_changed_cb,
|
|
|
|
zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "aclinherit",
|
|
|
|
acl_inherit_changed_cb, zfsvfs) == 0);
|
|
|
|
|
|
|
|
VERIFY(dsl_prop_unregister(ds, "vscan",
|
|
|
|
vscan_changed_cb, zfsvfs) == 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Convert a decimal digit string to a uint64_t integer.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
str_to_uint64(char *str, uint64_t *objnum)
|
|
|
|
{
|
|
|
|
uint64_t num = 0;
|
|
|
|
|
|
|
|
while (*str) {
|
|
|
|
if (*str < '0' || *str > '9')
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
num = num*10 + *str++ - '0';
|
|
|
|
}
|
|
|
|
|
|
|
|
*objnum = num;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The boot path passed from the boot loader is in the form of
|
|
|
|
* "rootpool-name/root-filesystem-object-number'. Convert this
|
|
|
|
* string to a dataset name: "rootpool-name/root-filesystem-name".
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_parse_bootfs(char *bpath, char *outpath)
|
|
|
|
{
|
|
|
|
char *slashp;
|
|
|
|
uint64_t objnum;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if (*bpath == 0 || *bpath == '/')
|
|
|
|
return (EINVAL);
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
(void) strcpy(outpath, bpath);
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
slashp = strchr(bpath, '/');
|
|
|
|
|
|
|
|
/* if no '/', just return the pool name */
|
|
|
|
if (slashp == NULL) {
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
/* if not a number, just return the root dataset name */
|
|
|
|
if (str_to_uint64(slashp+1, &objnum)) {
|
|
|
|
return (0);
|
|
|
|
}
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
*slashp = '\0';
|
|
|
|
error = dsl_dsobj_to_dsname(bpath, objnum, outpath);
|
|
|
|
*slashp = '/';
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_mountroot(vfs_t *vfsp, enum whymountroot why)
|
|
|
|
{
|
|
|
|
int error = 0;
|
|
|
|
static int zfsrootdone = 0;
|
|
|
|
zfsvfs_t *zfsvfs = NULL;
|
|
|
|
znode_t *zp = NULL;
|
|
|
|
vnode_t *vp = NULL;
|
|
|
|
char *zfs_bootfs;
|
2008-12-03 20:09:06 +00:00
|
|
|
char *zfs_devid;
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
ASSERT(vfsp);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The filesystem that we mount as root is defined in the
|
|
|
|
* boot property "zfs-bootfs" with a format of
|
|
|
|
* "poolname/root-dataset-objnum".
|
|
|
|
*/
|
|
|
|
if (why == ROOT_INIT) {
|
|
|
|
if (zfsrootdone++)
|
|
|
|
return (EBUSY);
|
|
|
|
/*
|
|
|
|
* the process of doing a spa_load will require the
|
|
|
|
* clock to be set before we could (for example) do
|
|
|
|
* something better by looking at the timestamp on
|
|
|
|
* an uberblock, so just set it to -1.
|
|
|
|
*/
|
|
|
|
clkset(-1);
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
if ((zfs_bootfs = spa_get_bootprop("zfs-bootfs")) == NULL) {
|
|
|
|
cmn_err(CE_NOTE, "spa_get_bootfs: can not get "
|
|
|
|
"bootfs name");
|
2008-11-20 20:01:55 +00:00
|
|
|
return (EINVAL);
|
|
|
|
}
|
2008-12-03 20:09:06 +00:00
|
|
|
zfs_devid = spa_get_bootprop("diskdevid");
|
|
|
|
error = spa_import_rootpool(rootfs.bo_name, zfs_devid);
|
|
|
|
if (zfs_devid)
|
|
|
|
spa_free_bootprop(zfs_devid);
|
|
|
|
if (error) {
|
|
|
|
spa_free_bootprop(zfs_bootfs);
|
|
|
|
cmn_err(CE_NOTE, "spa_import_rootpool: error %d",
|
2008-11-20 20:01:55 +00:00
|
|
|
error);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
if (error = zfs_parse_bootfs(zfs_bootfs, rootfs.bo_name)) {
|
2008-12-03 20:09:06 +00:00
|
|
|
spa_free_bootprop(zfs_bootfs);
|
|
|
|
cmn_err(CE_NOTE, "zfs_parse_bootfs: error %d",
|
2008-11-20 20:01:55 +00:00
|
|
|
error);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
spa_free_bootprop(zfs_bootfs);
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
if (error = vfs_lock(vfsp))
|
|
|
|
return (error);
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
if (error = zfs_domount(vfsp, rootfs.bo_name)) {
|
|
|
|
cmn_err(CE_NOTE, "zfs_domount: error %d", error);
|
2008-11-20 20:01:55 +00:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
zfsvfs = (zfsvfs_t *)vfsp->vfs_data;
|
|
|
|
ASSERT(zfsvfs);
|
|
|
|
if (error = zfs_zget(zfsvfs, zfsvfs->z_root, &zp)) {
|
2008-12-03 20:09:06 +00:00
|
|
|
cmn_err(CE_NOTE, "zfs_zget: error %d", error);
|
2008-11-20 20:01:55 +00:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
vp = ZTOV(zp);
|
|
|
|
mutex_enter(&vp->v_lock);
|
|
|
|
vp->v_flag |= VROOT;
|
|
|
|
mutex_exit(&vp->v_lock);
|
|
|
|
rootvp = vp;
|
|
|
|
|
|
|
|
/*
|
2008-12-03 20:09:06 +00:00
|
|
|
* Leave rootvp held. The root file system is never unmounted.
|
2008-11-20 20:01:55 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
vfs_add((struct vnode *)0, vfsp,
|
|
|
|
(vfsp->vfs_flag & VFS_RDONLY) ? MS_RDONLY : 0);
|
|
|
|
out:
|
|
|
|
vfs_unlock(vfsp);
|
|
|
|
return (error);
|
|
|
|
} else if (why == ROOT_REMOUNT) {
|
|
|
|
readonly_changed_cb(vfsp->vfs_data, B_FALSE);
|
|
|
|
vfsp->vfs_flag |= VFS_REMOUNT;
|
|
|
|
|
|
|
|
/* refresh mount options */
|
|
|
|
zfs_unregister_callbacks(vfsp->vfs_data);
|
|
|
|
return (zfs_register_callbacks(vfsp));
|
|
|
|
|
|
|
|
} else if (why == ROOT_UNMOUNT) {
|
|
|
|
zfs_unregister_callbacks((zfsvfs_t *)vfsp->vfs_data);
|
|
|
|
(void) zfs_sync(vfsp, 0, 0);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if "why" is equal to anything else other than ROOT_INIT,
|
|
|
|
* ROOT_REMOUNT, or ROOT_UNMOUNT, we do not support it.
|
|
|
|
*/
|
|
|
|
return (ENOTSUP);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*ARGSUSED*/
|
|
|
|
static int
|
|
|
|
zfs_mount(vfs_t *vfsp, vnode_t *mvp, struct mounta *uap, cred_t *cr)
|
|
|
|
{
|
|
|
|
char *osname;
|
|
|
|
pathname_t spn;
|
|
|
|
int error = 0;
|
|
|
|
uio_seg_t fromspace = (uap->flags & MS_SYSSPACE) ?
|
|
|
|
UIO_SYSSPACE : UIO_USERSPACE;
|
|
|
|
int canwrite;
|
|
|
|
|
|
|
|
if (mvp->v_type != VDIR)
|
|
|
|
return (ENOTDIR);
|
|
|
|
|
|
|
|
mutex_enter(&mvp->v_lock);
|
|
|
|
if ((uap->flags & MS_REMOUNT) == 0 &&
|
|
|
|
(uap->flags & MS_OVERLAY) == 0 &&
|
|
|
|
(mvp->v_count != 1 || (mvp->v_flag & VROOT))) {
|
|
|
|
mutex_exit(&mvp->v_lock);
|
|
|
|
return (EBUSY);
|
|
|
|
}
|
|
|
|
mutex_exit(&mvp->v_lock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ZFS does not support passing unparsed data in via MS_DATA.
|
|
|
|
* Users should use the MS_OPTIONSTR interface; this means
|
|
|
|
* that all option parsing is already done and the options struct
|
|
|
|
* can be interrogated.
|
|
|
|
*/
|
|
|
|
if ((uap->flags & MS_DATA) && uap->datalen > 0)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get the objset name (the "special" mount argument).
|
|
|
|
*/
|
|
|
|
if (error = pn_get(uap->spec, fromspace, &spn))
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
osname = spn.pn_path;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check for mount privilege?
|
|
|
|
*
|
|
|
|
* If we don't have privilege then see if
|
|
|
|
* we have local permission to allow it
|
|
|
|
*/
|
|
|
|
error = secpolicy_fs_mount(cr, mvp, vfsp);
|
|
|
|
if (error) {
|
|
|
|
error = dsl_deleg_access(osname, ZFS_DELEG_PERM_MOUNT, cr);
|
|
|
|
if (error == 0) {
|
|
|
|
vattr_t vattr;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure user is the owner of the mount point
|
|
|
|
* or has sufficient privileges.
|
|
|
|
*/
|
|
|
|
|
|
|
|
vattr.va_mask = AT_UID;
|
|
|
|
|
|
|
|
if (error = VOP_GETATTR(mvp, &vattr, 0, cr, NULL)) {
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (secpolicy_vnode_owner(cr, vattr.va_uid) != 0 &&
|
|
|
|
VOP_ACCESS(mvp, VWRITE, 0, cr, NULL) != 0) {
|
|
|
|
error = EPERM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
secpolicy_fs_mount_clearopts(cr, vfsp);
|
|
|
|
} else {
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Refuse to mount a filesystem if we are in a local zone and the
|
|
|
|
* dataset is not visible.
|
|
|
|
*/
|
|
|
|
if (!INGLOBALZONE(curproc) &&
|
|
|
|
(!zone_dataset_visible(osname, &canwrite) || !canwrite)) {
|
|
|
|
error = EPERM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* When doing a remount, we simply refresh our temporary properties
|
|
|
|
* according to those options set in the current VFS options.
|
|
|
|
*/
|
|
|
|
if (uap->flags & MS_REMOUNT) {
|
|
|
|
/* refresh mount options */
|
|
|
|
zfs_unregister_callbacks(vfsp->vfs_data);
|
|
|
|
error = zfs_register_callbacks(vfsp);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
error = zfs_domount(vfsp, osname);
|
2008-11-20 20:01:55 +00:00
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
/*
|
|
|
|
* Add an extra VFS_HOLD on our parent vfs so that it can't
|
|
|
|
* disappear due to a forced unmount.
|
|
|
|
*/
|
|
|
|
if (error == 0 && ((zfsvfs_t *)vfsp->vfs_data)->z_issnap)
|
|
|
|
VFS_HOLD(mvp->v_vfsp);
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
out:
|
|
|
|
pn_free(&spn);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_statvfs(vfs_t *vfsp, struct statvfs64 *statp)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vfsp->vfs_data;
|
|
|
|
dev32_t d32;
|
|
|
|
uint64_t refdbytes, availbytes, usedobjs, availobjs;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
dmu_objset_space(zfsvfs->z_os,
|
|
|
|
&refdbytes, &availbytes, &usedobjs, &availobjs);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The underlying storage pool actually uses multiple block sizes.
|
|
|
|
* We report the fragsize as the smallest block size we support,
|
|
|
|
* and we report our blocksize as the filesystem's maximum blocksize.
|
|
|
|
*/
|
|
|
|
statp->f_frsize = 1UL << SPA_MINBLOCKSHIFT;
|
|
|
|
statp->f_bsize = zfsvfs->z_max_blksz;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The following report "total" blocks of various kinds in the
|
|
|
|
* file system, but reported in terms of f_frsize - the
|
|
|
|
* "fragment" size.
|
|
|
|
*/
|
|
|
|
|
|
|
|
statp->f_blocks = (refdbytes + availbytes) >> SPA_MINBLOCKSHIFT;
|
|
|
|
statp->f_bfree = availbytes >> SPA_MINBLOCKSHIFT;
|
|
|
|
statp->f_bavail = statp->f_bfree; /* no root reservation */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* statvfs() should really be called statufs(), because it assumes
|
|
|
|
* static metadata. ZFS doesn't preallocate files, so the best
|
|
|
|
* we can do is report the max that could possibly fit in f_files,
|
|
|
|
* and that minus the number actually used in f_ffree.
|
|
|
|
* For f_ffree, report the smaller of the number of object available
|
|
|
|
* and the number of blocks (each object will take at least a block).
|
|
|
|
*/
|
|
|
|
statp->f_ffree = MIN(availobjs, statp->f_bfree);
|
|
|
|
statp->f_favail = statp->f_ffree; /* no "root reservation" */
|
|
|
|
statp->f_files = statp->f_ffree + usedobjs;
|
|
|
|
|
|
|
|
(void) cmpldev(&d32, vfsp->vfs_dev);
|
|
|
|
statp->f_fsid = d32;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We're a zfs filesystem.
|
|
|
|
*/
|
|
|
|
(void) strcpy(statp->f_basetype, vfssw[vfsp->vfs_fstype].vsw_name);
|
|
|
|
|
|
|
|
statp->f_flag = vf_to_stf(vfsp->vfs_flag);
|
|
|
|
|
|
|
|
statp->f_namemax = ZFS_MAXNAMELEN;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We have all of 32 characters to stuff a string here.
|
|
|
|
* Is there anything useful we could/should provide?
|
|
|
|
*/
|
|
|
|
bzero(statp->f_fstr, sizeof (statp->f_fstr));
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_root(vfs_t *vfsp, vnode_t **vpp)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vfsp->vfs_data;
|
|
|
|
znode_t *rootzp;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
error = zfs_zget(zfsvfs, zfsvfs->z_root, &rootzp);
|
|
|
|
if (error == 0)
|
|
|
|
*vpp = ZTOV(rootzp);
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Teardown the zfsvfs::z_os.
|
|
|
|
*
|
|
|
|
* Note, if 'unmounting' if FALSE, we return with the 'z_teardown_lock'
|
|
|
|
* and 'z_teardown_inactive_lock' held.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfsvfs_teardown(zfsvfs_t *zfsvfs, boolean_t unmounting)
|
|
|
|
{
|
|
|
|
znode_t *zp;
|
|
|
|
|
|
|
|
rrw_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG);
|
|
|
|
|
|
|
|
if (!unmounting) {
|
|
|
|
/*
|
|
|
|
* We purge the parent filesystem's vfsp as the parent
|
|
|
|
* filesystem and all of its snapshots have their vnode's
|
|
|
|
* v_vfsp set to the parent's filesystem's vfsp. Note,
|
|
|
|
* 'z_parent' is self referential for non-snapshots.
|
|
|
|
*/
|
|
|
|
(void) dnlc_purge_vfsp(zfsvfs->z_parent->z_vfs, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Close the zil. NB: Can't close the zil while zfs_inactive
|
|
|
|
* threads are blocked as zil_close can call zfs_inactive.
|
|
|
|
*/
|
|
|
|
if (zfsvfs->z_log) {
|
|
|
|
zil_close(zfsvfs->z_log);
|
|
|
|
zfsvfs->z_log = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
rw_enter(&zfsvfs->z_teardown_inactive_lock, RW_WRITER);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we are not unmounting (ie: online recv) and someone already
|
|
|
|
* unmounted this file system while we were doing the switcheroo,
|
|
|
|
* or a reopen of z_os failed then just bail out now.
|
|
|
|
*/
|
|
|
|
if (!unmounting && (zfsvfs->z_unmounted || zfsvfs->z_os == NULL)) {
|
|
|
|
rw_exit(&zfsvfs->z_teardown_inactive_lock);
|
|
|
|
rrw_exit(&zfsvfs->z_teardown_lock, FTAG);
|
|
|
|
return (EIO);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* At this point there are no vops active, and any new vops will
|
|
|
|
* fail with EIO since we have z_teardown_lock for writer (only
|
|
|
|
* relavent for forced unmount).
|
|
|
|
*
|
|
|
|
* Release all holds on dbufs.
|
|
|
|
*/
|
|
|
|
mutex_enter(&zfsvfs->z_znodes_lock);
|
|
|
|
for (zp = list_head(&zfsvfs->z_all_znodes); zp != NULL;
|
|
|
|
zp = list_next(&zfsvfs->z_all_znodes, zp))
|
|
|
|
if (zp->z_dbuf) {
|
|
|
|
ASSERT(ZTOV(zp)->v_count > 0);
|
|
|
|
zfs_znode_dmu_fini(zp);
|
|
|
|
}
|
|
|
|
mutex_exit(&zfsvfs->z_znodes_lock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we are unmounting, set the unmounted flag and let new vops
|
|
|
|
* unblock. zfs_inactive will have the unmounted behavior, and all
|
|
|
|
* other vops will fail with EIO.
|
|
|
|
*/
|
|
|
|
if (unmounting) {
|
|
|
|
zfsvfs->z_unmounted = B_TRUE;
|
|
|
|
rrw_exit(&zfsvfs->z_teardown_lock, FTAG);
|
|
|
|
rw_exit(&zfsvfs->z_teardown_inactive_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* z_os will be NULL if there was an error in attempting to reopen
|
|
|
|
* zfsvfs, so just return as the properties had already been
|
|
|
|
* unregistered and cached data had been evicted before.
|
|
|
|
*/
|
|
|
|
if (zfsvfs->z_os == NULL)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unregister properties.
|
|
|
|
*/
|
|
|
|
zfs_unregister_callbacks(zfsvfs);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Evict cached data
|
|
|
|
*/
|
|
|
|
if (dmu_objset_evict_dbufs(zfsvfs->z_os)) {
|
|
|
|
txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0);
|
|
|
|
(void) dmu_objset_evict_dbufs(zfsvfs->z_os);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*ARGSUSED*/
|
|
|
|
static int
|
|
|
|
zfs_umount(vfs_t *vfsp, int fflag, cred_t *cr)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vfsp->vfs_data;
|
|
|
|
objset_t *os;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = secpolicy_fs_unmount(cr, vfsp);
|
|
|
|
if (ret) {
|
|
|
|
ret = dsl_deleg_access((char *)refstr_value(vfsp->vfs_resource),
|
|
|
|
ZFS_DELEG_PERM_MOUNT, cr);
|
|
|
|
if (ret)
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We purge the parent filesystem's vfsp as the parent filesystem
|
|
|
|
* and all of its snapshots have their vnode's v_vfsp set to the
|
|
|
|
* parent's filesystem's vfsp. Note, 'z_parent' is self
|
|
|
|
* referential for non-snapshots.
|
|
|
|
*/
|
|
|
|
(void) dnlc_purge_vfsp(zfsvfs->z_parent->z_vfs, 0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unmount any snapshots mounted under .zfs before unmounting the
|
|
|
|
* dataset itself.
|
|
|
|
*/
|
|
|
|
if (zfsvfs->z_ctldir != NULL &&
|
|
|
|
(ret = zfsctl_umount_snapshots(vfsp, fflag, cr)) != 0) {
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!(fflag & MS_FORCE)) {
|
|
|
|
/*
|
|
|
|
* Check the number of active vnodes in the file system.
|
|
|
|
* Our count is maintained in the vfs structure, but the
|
|
|
|
* number is off by 1 to indicate a hold on the vfs
|
|
|
|
* structure itself.
|
|
|
|
*
|
|
|
|
* The '.zfs' directory maintains a reference of its
|
|
|
|
* own, and any active references underneath are
|
|
|
|
* reflected in the vnode count.
|
|
|
|
*/
|
|
|
|
if (zfsvfs->z_ctldir == NULL) {
|
|
|
|
if (vfsp->vfs_count > 1)
|
|
|
|
return (EBUSY);
|
|
|
|
} else {
|
|
|
|
if (vfsp->vfs_count > 2 ||
|
|
|
|
zfsvfs->z_ctldir->v_count > 1)
|
|
|
|
return (EBUSY);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
vfsp->vfs_flag |= VFS_UNMOUNTED;
|
|
|
|
|
|
|
|
VERIFY(zfsvfs_teardown(zfsvfs, B_TRUE) == 0);
|
|
|
|
os = zfsvfs->z_os;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* z_os will be NULL if there was an error in
|
|
|
|
* attempting to reopen zfsvfs.
|
|
|
|
*/
|
|
|
|
if (os != NULL) {
|
|
|
|
/*
|
|
|
|
* Unset the objset user_ptr.
|
|
|
|
*/
|
|
|
|
mutex_enter(&os->os->os_user_ptr_lock);
|
|
|
|
dmu_objset_set_user(os, NULL);
|
|
|
|
mutex_exit(&os->os->os_user_ptr_lock);
|
|
|
|
|
|
|
|
/*
|
2008-12-03 20:09:06 +00:00
|
|
|
* Finally release the objset
|
2008-11-20 20:01:55 +00:00
|
|
|
*/
|
|
|
|
dmu_objset_close(os);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We can now safely destroy the '.zfs' directory node.
|
|
|
|
*/
|
|
|
|
if (zfsvfs->z_ctldir != NULL)
|
|
|
|
zfsctl_destroy(zfsvfs);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_vget(vfs_t *vfsp, vnode_t **vpp, fid_t *fidp)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vfsp->vfs_data;
|
|
|
|
znode_t *zp;
|
|
|
|
uint64_t object = 0;
|
|
|
|
uint64_t fid_gen = 0;
|
|
|
|
uint64_t gen_mask;
|
|
|
|
uint64_t zp_gen;
|
|
|
|
int i, err;
|
|
|
|
|
|
|
|
*vpp = NULL;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
if (fidp->fid_len == LONG_FID_LEN) {
|
|
|
|
zfid_long_t *zlfid = (zfid_long_t *)fidp;
|
|
|
|
uint64_t objsetid = 0;
|
|
|
|
uint64_t setgen = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < sizeof (zlfid->zf_setid); i++)
|
|
|
|
objsetid |= ((uint64_t)zlfid->zf_setid[i]) << (8 * i);
|
|
|
|
|
|
|
|
for (i = 0; i < sizeof (zlfid->zf_setgen); i++)
|
|
|
|
setgen |= ((uint64_t)zlfid->zf_setgen[i]) << (8 * i);
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
err = zfsctl_lookup_objset(vfsp, objsetid, &zfsvfs);
|
|
|
|
if (err)
|
|
|
|
return (EINVAL);
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (fidp->fid_len == SHORT_FID_LEN || fidp->fid_len == LONG_FID_LEN) {
|
|
|
|
zfid_short_t *zfid = (zfid_short_t *)fidp;
|
|
|
|
|
|
|
|
for (i = 0; i < sizeof (zfid->zf_object); i++)
|
|
|
|
object |= ((uint64_t)zfid->zf_object[i]) << (8 * i);
|
|
|
|
|
|
|
|
for (i = 0; i < sizeof (zfid->zf_gen); i++)
|
|
|
|
fid_gen |= ((uint64_t)zfid->zf_gen[i]) << (8 * i);
|
|
|
|
} else {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* A zero fid_gen means we are in the .zfs control directories */
|
|
|
|
if (fid_gen == 0 &&
|
|
|
|
(object == ZFSCTL_INO_ROOT || object == ZFSCTL_INO_SNAPDIR)) {
|
|
|
|
*vpp = zfsvfs->z_ctldir;
|
|
|
|
ASSERT(*vpp != NULL);
|
|
|
|
if (object == ZFSCTL_INO_SNAPDIR) {
|
|
|
|
VERIFY(zfsctl_root_lookup(*vpp, "snapshot", vpp, NULL,
|
|
|
|
0, NULL, NULL, NULL, NULL, NULL) == 0);
|
|
|
|
} else {
|
|
|
|
VN_HOLD(*vpp);
|
|
|
|
}
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
gen_mask = -1ULL >> (64 - 8 * i);
|
|
|
|
|
|
|
|
dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask);
|
|
|
|
if (err = zfs_zget(zfsvfs, object, &zp)) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
zp_gen = zp->z_phys->zp_gen & gen_mask;
|
|
|
|
if (zp_gen == 0)
|
|
|
|
zp_gen = 1;
|
|
|
|
if (zp->z_unlinked || zp_gen != fid_gen) {
|
|
|
|
dprintf("znode gen (%u) != fid gen (%u)\n", zp_gen, fid_gen);
|
|
|
|
VN_RELE(ZTOV(zp));
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
*vpp = ZTOV(zp);
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Block out VOPs and close zfsvfs_t::z_os
|
|
|
|
*
|
|
|
|
* Note, if successful, then we return with the 'z_teardown_lock' and
|
|
|
|
* 'z_teardown_inactive_lock' write held.
|
|
|
|
*/
|
|
|
|
int
|
2009-07-02 22:44:48 +00:00
|
|
|
zfs_suspend_fs(zfsvfs_t *zfsvfs, char *name, int *modep)
|
2008-11-20 20:01:55 +00:00
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = zfsvfs_teardown(zfsvfs, B_FALSE)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
*modep = zfsvfs->z_os->os_mode;
|
|
|
|
if (name)
|
|
|
|
dmu_objset_name(zfsvfs->z_os, name);
|
2008-11-20 20:01:55 +00:00
|
|
|
dmu_objset_close(zfsvfs->z_os);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Reopen zfsvfs_t::z_os and release VOPs.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
zfs_resume_fs(zfsvfs_t *zfsvfs, const char *osname, int mode)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
ASSERT(RRW_WRITE_HELD(&zfsvfs->z_teardown_lock));
|
|
|
|
ASSERT(RW_WRITE_HELD(&zfsvfs->z_teardown_inactive_lock));
|
|
|
|
|
|
|
|
err = dmu_objset_open(osname, DMU_OST_ZFS, mode, &zfsvfs->z_os);
|
|
|
|
if (err) {
|
|
|
|
zfsvfs->z_os = NULL;
|
|
|
|
} else {
|
|
|
|
znode_t *zp;
|
|
|
|
|
|
|
|
VERIFY(zfsvfs_setup(zfsvfs, B_FALSE) == 0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Attempt to re-establish all the active znodes with
|
|
|
|
* their dbufs. If a zfs_rezget() fails, then we'll let
|
|
|
|
* any potential callers discover that via ZFS_ENTER_VERIFY_VP
|
|
|
|
* when they try to use their znode.
|
|
|
|
*/
|
|
|
|
mutex_enter(&zfsvfs->z_znodes_lock);
|
|
|
|
for (zp = list_head(&zfsvfs->z_all_znodes); zp;
|
|
|
|
zp = list_next(&zfsvfs->z_all_znodes, zp)) {
|
|
|
|
(void) zfs_rezget(zp);
|
|
|
|
}
|
|
|
|
mutex_exit(&zfsvfs->z_znodes_lock);
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
/* release the VOPs */
|
|
|
|
rw_exit(&zfsvfs->z_teardown_inactive_lock);
|
|
|
|
rrw_exit(&zfsvfs->z_teardown_lock, FTAG);
|
|
|
|
|
|
|
|
if (err) {
|
|
|
|
/*
|
|
|
|
* Since we couldn't reopen zfsvfs::z_os, force
|
|
|
|
* unmount this file system.
|
|
|
|
*/
|
|
|
|
if (vn_vfswlock(zfsvfs->z_vfs->vfs_vnodecovered) == 0)
|
|
|
|
(void) dounmount(zfsvfs->z_vfs, MS_FORCE, CRED());
|
|
|
|
}
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_freevfs(vfs_t *vfsp)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vfsp->vfs_data;
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
/*
|
|
|
|
* If this is a snapshot, we have an extra VFS_HOLD on our parent
|
|
|
|
* from zfs_mount(). Release it here.
|
|
|
|
*/
|
|
|
|
if (zfsvfs->z_issnap)
|
|
|
|
VFS_RELE(zfsvfs->z_parent->z_vfs);
|
2008-11-20 20:01:55 +00:00
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
zfsvfs_free(zfsvfs);
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
atomic_add_32(&zfs_active_fs_count, -1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* VFS_INIT() initialization. Note that there is no VFS_FINI(),
|
|
|
|
* so we can't safely do any non-idempotent initialization here.
|
|
|
|
* Leave that to zfs_init() and zfs_fini(), which are called
|
|
|
|
* from the module's _init() and _fini() entry points.
|
|
|
|
*/
|
|
|
|
/*ARGSUSED*/
|
|
|
|
static int
|
|
|
|
zfs_vfsinit(int fstype, char *name)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
|
|
|
zfsfstype = fstype;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Setup vfsops and vnodeops tables.
|
|
|
|
*/
|
|
|
|
error = vfs_setfsops(fstype, zfs_vfsops_template, &zfs_vfsops);
|
|
|
|
if (error != 0) {
|
|
|
|
cmn_err(CE_WARN, "zfs: bad vfs ops template");
|
|
|
|
}
|
|
|
|
|
|
|
|
error = zfs_create_op_tables();
|
|
|
|
if (error) {
|
|
|
|
zfs_remove_op_tables();
|
|
|
|
cmn_err(CE_WARN, "zfs: bad vnode ops template");
|
|
|
|
(void) vfs_freevfsops_by_type(zfsfstype);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_init(&zfs_dev_mtx, NULL, MUTEX_DEFAULT, NULL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unique major number for all zfs mounts.
|
|
|
|
* If we run out of 32-bit minors, we'll getudev() another major.
|
|
|
|
*/
|
|
|
|
zfs_major = ddi_name_to_major(ZFS_DRIVER);
|
|
|
|
zfs_minor = ZFS_MIN_MINOR;
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#endif /* HAVE_ZPL */
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
void
|
|
|
|
zfs_init(void)
|
|
|
|
{
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#ifdef HAVE_ZPL
|
2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* Initialize .zfs directory structures
|
|
|
|
*/
|
|
|
|
zfsctl_init();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize znode cache, vnode ops, etc...
|
|
|
|
*/
|
|
|
|
zfs_znode_init();
|
2009-07-02 22:44:48 +00:00
|
|
|
|
|
|
|
dmu_objset_register_type(DMU_OST_ZFS, zfs_space_delta_cb);
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#endif /* HAVE_ZPL */
|
2008-11-20 20:01:55 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
zfs_fini(void)
|
|
|
|
{
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#ifdef HAVE_ZPL
|
2008-11-20 20:01:55 +00:00
|
|
|
zfsctl_fini();
|
|
|
|
zfs_znode_fini();
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#endif /* HAVE_ZPL */
|
2008-11-20 20:01:55 +00:00
|
|
|
}
|
|
|
|
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#ifdef HAVE_ZPL
|
2008-11-20 20:01:55 +00:00
|
|
|
int
|
2009-07-02 22:44:48 +00:00
|
|
|
zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers)
|
2008-11-20 20:01:55 +00:00
|
|
|
{
|
|
|
|
int error;
|
2009-07-02 22:44:48 +00:00
|
|
|
objset_t *os = zfsvfs->z_os;
|
2008-11-20 20:01:55 +00:00
|
|
|
dmu_tx_t *tx;
|
|
|
|
|
|
|
|
if (newvers < ZPL_VERSION_INITIAL || newvers > ZPL_VERSION)
|
|
|
|
return (EINVAL);
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
if (newvers < zfsvfs->z_version)
|
|
|
|
return (EINVAL);
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
tx = dmu_tx_create(os);
|
2009-07-02 22:44:48 +00:00
|
|
|
dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_FALSE, ZPL_VERSION_STR);
|
2008-11-20 20:01:55 +00:00
|
|
|
error = dmu_tx_assign(tx, TXG_WAIT);
|
|
|
|
if (error) {
|
|
|
|
dmu_tx_abort(tx);
|
2009-07-02 22:44:48 +00:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
error = zap_update(os, MASTER_NODE_OBJ, ZPL_VERSION_STR,
|
|
|
|
8, 1, &newvers, tx);
|
|
|
|
|
|
|
|
if (error) {
|
|
|
|
dmu_tx_commit(tx);
|
|
|
|
return (error);
|
2008-11-20 20:01:55 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
spa_history_internal_log(LOG_DS_UPGRADE,
|
|
|
|
dmu_objset_spa(os), tx, CRED(),
|
2009-07-02 22:44:48 +00:00
|
|
|
"oldver=%llu newver=%llu dataset = %llu",
|
|
|
|
zfsvfs->z_version, newvers, dmu_objset_id(os));
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
dmu_tx_commit(tx);
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
zfsvfs->z_version = newvers;
|
|
|
|
|
|
|
|
if (zfsvfs->z_version >= ZPL_VERSION_FUID)
|
|
|
|
zfs_set_fuid_feature(zfsvfs);
|
|
|
|
|
|
|
|
return (0);
|
2008-11-20 20:01:55 +00:00
|
|
|
}
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#endif /* HAVE_ZPL */
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Read a property stored within the master node.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
zfs_get_zplprop(objset_t *os, zfs_prop_t prop, uint64_t *value)
|
|
|
|
{
|
|
|
|
const char *pname;
|
2008-12-03 20:09:06 +00:00
|
|
|
int error = ENOENT;
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Look up the file system's value for the property. For the
|
|
|
|
* version property, we look up a slightly different string.
|
|
|
|
*/
|
|
|
|
if (prop == ZFS_PROP_VERSION)
|
|
|
|
pname = ZPL_VERSION_STR;
|
|
|
|
else
|
|
|
|
pname = zfs_prop_to_name(prop);
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
if (os != NULL)
|
|
|
|
error = zap_lookup(os, MASTER_NODE_OBJ, pname, 8, 1, value);
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
if (error == ENOENT) {
|
|
|
|
/* No value set, use the default value */
|
|
|
|
switch (prop) {
|
|
|
|
case ZFS_PROP_VERSION:
|
|
|
|
*value = ZPL_VERSION;
|
|
|
|
break;
|
|
|
|
case ZFS_PROP_NORMALIZE:
|
|
|
|
case ZFS_PROP_UTF8ONLY:
|
|
|
|
*value = 0;
|
|
|
|
break;
|
|
|
|
case ZFS_PROP_CASE:
|
|
|
|
*value = ZFS_CASE_SENSITIVE;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
error = 0;
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#ifdef HAVE_ZPL
|
2008-11-20 20:01:55 +00:00
|
|
|
static vfsdef_t vfw = {
|
|
|
|
VFSDEF_VERSION,
|
|
|
|
MNTTYPE_ZFS,
|
|
|
|
zfs_vfsinit,
|
|
|
|
VSW_HASPROTO|VSW_CANRWRO|VSW_CANREMOUNT|VSW_VOLATILEDEV|VSW_STATS|
|
|
|
|
VSW_XID,
|
|
|
|
&zfs_mntopts
|
|
|
|
};
|
|
|
|
|
|
|
|
struct modlfs zfs_modlfs = {
|
|
|
|
&mod_fsops, "ZFS filesystem version " SPA_VERSION_STRING, &vfw
|
|
|
|
};
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#endif /* HAVE_ZPL */
|