2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* CDDL HEADER START
|
|
|
|
*
|
|
|
|
* The contents of this file are subject to the terms of the
|
|
|
|
* Common Development and Distribution License (the "License").
|
|
|
|
* You may not use this file except in compliance with the License.
|
|
|
|
*
|
|
|
|
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
|
|
|
* or http://www.opensolaris.org/os/licensing.
|
|
|
|
* See the License for the specific language governing permissions
|
|
|
|
* and limitations under the License.
|
|
|
|
*
|
|
|
|
* When distributing Covered Code, include this CDDL HEADER in each
|
|
|
|
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
|
|
|
* If applicable, add the following below this CDDL HEADER, with the
|
|
|
|
* fields enclosed by brackets "[]" replaced with your own identifying
|
|
|
|
* information: Portions Copyright [yyyy] [name of copyright owner]
|
|
|
|
*
|
|
|
|
* CDDL HEADER END
|
|
|
|
*/
|
|
|
|
/*
|
2010-05-28 20:45:14 +00:00
|
|
|
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
2008-11-20 20:01:55 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ZFS control directory (a.k.a. ".zfs")
|
|
|
|
*
|
|
|
|
* This directory provides a common location for all ZFS meta-objects.
|
|
|
|
* Currently, this is only the 'snapshot' directory, but this may expand in the
|
|
|
|
* future. The elements are built using the GFS primitives, as the hierarchy
|
|
|
|
* does not actually exist on disk.
|
|
|
|
*
|
|
|
|
* For 'snapshot', we don't want to have all snapshots always mounted, because
|
|
|
|
* this would take up a huge amount of space in /etc/mnttab. We have three
|
|
|
|
* types of objects:
|
|
|
|
*
|
|
|
|
* ctldir ------> snapshotdir -------> snapshot
|
|
|
|
* |
|
|
|
|
* |
|
|
|
|
* V
|
|
|
|
* mounted fs
|
|
|
|
*
|
|
|
|
* The 'snapshot' node contains just enough information to lookup '..' and act
|
|
|
|
* as a mountpoint for the snapshot. Whenever we lookup a specific snapshot, we
|
|
|
|
* perform an automount of the underlying filesystem and return the
|
|
|
|
* corresponding vnode.
|
|
|
|
*
|
|
|
|
* All mounts are handled automatically by the kernel, but unmounts are
|
|
|
|
* (currently) handled from user land. The main reason is that there is no
|
|
|
|
* reliable way to auto-unmount the filesystem when it's "no longer in use".
|
|
|
|
* When the user unmounts a filesystem, we call zfsctl_unmount(), which
|
|
|
|
* unmounts any snapshots within the snapshot directory.
|
|
|
|
*
|
|
|
|
* The '.zfs', '.zfs/snapshot', and all directories created under
|
|
|
|
* '.zfs/snapshot' (ie: '.zfs/snapshot/<snapname>') are all GFS nodes and
|
|
|
|
* share the same vfs_t as the head filesystem (what '.zfs' lives under).
|
|
|
|
*
|
|
|
|
* File systems mounted ontop of the GFS nodes '.zfs/snapshot/<snapname>'
|
|
|
|
* (ie: snapshots) are ZFS nodes and have their own unique vfs_t.
|
|
|
|
* However, vnodes within these mounted on file systems have their v_vfsp
|
|
|
|
* fields set to the head filesystem to make NFS happy (see
|
|
|
|
* zfsctl_snapdir_lookup()). We VFS_HOLD the head filesystem's vfs_t
|
|
|
|
* so that it cannot be freed until all snapshots have been unmounted.
|
|
|
|
*/
|
|
|
|
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#ifdef HAVE_ZPL
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
#include <fs/fs_subr.h>
|
|
|
|
#include <sys/zfs_ctldir.h>
|
|
|
|
#include <sys/zfs_ioctl.h>
|
|
|
|
#include <sys/zfs_vfsops.h>
|
|
|
|
#include <sys/vfs_opreg.h>
|
|
|
|
#include <sys/gfs.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
#include <sys/dmu.h>
|
|
|
|
#include <sys/dsl_deleg.h>
|
|
|
|
#include <sys/mount.h>
|
|
|
|
#include <sys/sunddi.h>
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
#include "zfs_namecheck.h"
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
typedef struct zfsctl_node {
|
|
|
|
gfs_dir_t zc_gfs_private;
|
|
|
|
uint64_t zc_id;
|
|
|
|
timestruc_t zc_cmtime; /* ctime and mtime, always the same */
|
|
|
|
} zfsctl_node_t;
|
|
|
|
|
|
|
|
typedef struct zfsctl_snapdir {
|
|
|
|
zfsctl_node_t sd_node;
|
|
|
|
kmutex_t sd_lock;
|
|
|
|
avl_tree_t sd_snaps;
|
|
|
|
} zfsctl_snapdir_t;
|
|
|
|
|
|
|
|
typedef struct {
|
|
|
|
char *se_name;
|
|
|
|
vnode_t *se_root;
|
|
|
|
avl_node_t se_node;
|
|
|
|
} zfs_snapentry_t;
|
|
|
|
|
|
|
|
static int
|
|
|
|
snapentry_compare(const void *a, const void *b)
|
|
|
|
{
|
|
|
|
const zfs_snapentry_t *sa = a;
|
|
|
|
const zfs_snapentry_t *sb = b;
|
|
|
|
int ret = strcmp(sa->se_name, sb->se_name);
|
|
|
|
|
|
|
|
if (ret < 0)
|
|
|
|
return (-1);
|
|
|
|
else if (ret > 0)
|
|
|
|
return (1);
|
|
|
|
else
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
vnodeops_t *zfsctl_ops_root;
|
|
|
|
vnodeops_t *zfsctl_ops_snapdir;
|
|
|
|
vnodeops_t *zfsctl_ops_snapshot;
|
2009-07-02 22:44:48 +00:00
|
|
|
vnodeops_t *zfsctl_ops_shares;
|
|
|
|
vnodeops_t *zfsctl_ops_shares_dir;
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
static const fs_operation_def_t zfsctl_tops_root[];
|
|
|
|
static const fs_operation_def_t zfsctl_tops_snapdir[];
|
|
|
|
static const fs_operation_def_t zfsctl_tops_snapshot[];
|
2009-07-02 22:44:48 +00:00
|
|
|
static const fs_operation_def_t zfsctl_tops_shares[];
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
static vnode_t *zfsctl_mknode_snapdir(vnode_t *);
|
2009-07-02 22:44:48 +00:00
|
|
|
static vnode_t *zfsctl_mknode_shares(vnode_t *);
|
2008-11-20 20:01:55 +00:00
|
|
|
static vnode_t *zfsctl_snapshot_mknode(vnode_t *, uint64_t objset);
|
|
|
|
static int zfsctl_unmount_snap(zfs_snapentry_t *, int, cred_t *);
|
|
|
|
|
|
|
|
static gfs_opsvec_t zfsctl_opsvec[] = {
|
|
|
|
{ ".zfs", zfsctl_tops_root, &zfsctl_ops_root },
|
|
|
|
{ ".zfs/snapshot", zfsctl_tops_snapdir, &zfsctl_ops_snapdir },
|
|
|
|
{ ".zfs/snapshot/vnode", zfsctl_tops_snapshot, &zfsctl_ops_snapshot },
|
2009-07-02 22:44:48 +00:00
|
|
|
{ ".zfs/shares", zfsctl_tops_shares, &zfsctl_ops_shares_dir },
|
|
|
|
{ ".zfs/shares/vnode", zfsctl_tops_shares, &zfsctl_ops_shares },
|
2008-11-20 20:01:55 +00:00
|
|
|
{ NULL }
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
2009-07-02 22:44:48 +00:00
|
|
|
* Root directory elements. We only have two entries
|
|
|
|
* snapshot and shares.
|
2008-11-20 20:01:55 +00:00
|
|
|
*/
|
|
|
|
static gfs_dirent_t zfsctl_root_entries[] = {
|
|
|
|
{ "snapshot", zfsctl_mknode_snapdir, GFS_CACHE_VNODE },
|
2009-07-02 22:44:48 +00:00
|
|
|
{ "shares", zfsctl_mknode_shares, GFS_CACHE_VNODE },
|
2008-11-20 20:01:55 +00:00
|
|
|
{ NULL }
|
|
|
|
};
|
|
|
|
|
|
|
|
/* include . and .. in the calculation */
|
|
|
|
#define NROOT_ENTRIES ((sizeof (zfsctl_root_entries) / \
|
|
|
|
sizeof (gfs_dirent_t)) + 1)
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Initialize the various GFS pieces we'll need to create and manipulate .zfs
|
|
|
|
* directories. This is called from the ZFS init routine, and initializes the
|
|
|
|
* vnode ops vectors that we'll be using.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
zfsctl_init(void)
|
|
|
|
{
|
|
|
|
VERIFY(gfs_make_opsvec(zfsctl_opsvec) == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
zfsctl_fini(void)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Remove vfsctl vnode ops
|
|
|
|
*/
|
|
|
|
if (zfsctl_ops_root)
|
|
|
|
vn_freevnodeops(zfsctl_ops_root);
|
|
|
|
if (zfsctl_ops_snapdir)
|
|
|
|
vn_freevnodeops(zfsctl_ops_snapdir);
|
|
|
|
if (zfsctl_ops_snapshot)
|
|
|
|
vn_freevnodeops(zfsctl_ops_snapshot);
|
2009-07-02 22:44:48 +00:00
|
|
|
if (zfsctl_ops_shares)
|
|
|
|
vn_freevnodeops(zfsctl_ops_shares);
|
|
|
|
if (zfsctl_ops_shares_dir)
|
|
|
|
vn_freevnodeops(zfsctl_ops_shares_dir);
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
zfsctl_ops_root = NULL;
|
|
|
|
zfsctl_ops_snapdir = NULL;
|
|
|
|
zfsctl_ops_snapshot = NULL;
|
2009-07-02 22:44:48 +00:00
|
|
|
zfsctl_ops_shares = NULL;
|
|
|
|
zfsctl_ops_shares_dir = NULL;
|
2008-11-20 20:01:55 +00:00
|
|
|
}
|
|
|
|
|
2010-05-28 20:45:14 +00:00
|
|
|
boolean_t
|
|
|
|
zfsctl_is_node(vnode_t *vp)
|
|
|
|
{
|
|
|
|
return (vn_matchops(vp, zfsctl_ops_root) ||
|
|
|
|
vn_matchops(vp, zfsctl_ops_snapdir) ||
|
|
|
|
vn_matchops(vp, zfsctl_ops_snapshot) ||
|
|
|
|
vn_matchops(vp, zfsctl_ops_shares) ||
|
|
|
|
vn_matchops(vp, zfsctl_ops_shares_dir));
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
/*
|
2009-07-02 22:44:48 +00:00
|
|
|
* Return the inode number associated with the 'snapshot' or
|
|
|
|
* 'shares' directory.
|
2008-11-20 20:01:55 +00:00
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
|
|
|
static ino64_t
|
|
|
|
zfsctl_root_inode_cb(vnode_t *vp, int index)
|
|
|
|
{
|
2009-07-02 22:44:48 +00:00
|
|
|
zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
|
|
|
|
|
|
|
|
ASSERT(index <= 2);
|
|
|
|
|
|
|
|
if (index == 0)
|
|
|
|
return (ZFSCTL_INO_SNAPDIR);
|
|
|
|
|
|
|
|
return (zfsvfs->z_shares_dir);
|
2008-11-20 20:01:55 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create the '.zfs' directory. This directory is cached as part of the VFS
|
|
|
|
* structure. This results in a hold on the vfs_t. The code in zfs_umount()
|
|
|
|
* therefore checks against a vfs_count of 2 instead of 1. This reference
|
|
|
|
* is removed when the ctldir is destroyed in the unmount.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
zfsctl_create(zfsvfs_t *zfsvfs)
|
|
|
|
{
|
|
|
|
vnode_t *vp, *rvp;
|
|
|
|
zfsctl_node_t *zcp;
|
2010-05-28 20:45:14 +00:00
|
|
|
uint64_t crtime[2];
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
ASSERT(zfsvfs->z_ctldir == NULL);
|
|
|
|
|
|
|
|
vp = gfs_root_create(sizeof (zfsctl_node_t), zfsvfs->z_vfs,
|
|
|
|
zfsctl_ops_root, ZFSCTL_INO_ROOT, zfsctl_root_entries,
|
|
|
|
zfsctl_root_inode_cb, MAXNAMELEN, NULL, NULL);
|
|
|
|
zcp = vp->v_data;
|
|
|
|
zcp->zc_id = ZFSCTL_INO_ROOT;
|
|
|
|
|
|
|
|
VERIFY(VFS_ROOT(zfsvfs->z_vfs, &rvp) == 0);
|
2010-05-28 20:45:14 +00:00
|
|
|
VERIFY(0 == sa_lookup(VTOZ(rvp)->z_sa_hdl, SA_ZPL_CRTIME(zfsvfs),
|
|
|
|
&crtime, sizeof (crtime)));
|
|
|
|
ZFS_TIME_DECODE(&zcp->zc_cmtime, crtime);
|
2008-11-20 20:01:55 +00:00
|
|
|
VN_RELE(rvp);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We're only faking the fact that we have a root of a filesystem for
|
|
|
|
* the sake of the GFS interfaces. Undo the flag manipulation it did
|
|
|
|
* for us.
|
|
|
|
*/
|
|
|
|
vp->v_flag &= ~(VROOT | VNOCACHE | VNOMAP | VNOSWAP | VNOMOUNT);
|
|
|
|
|
|
|
|
zfsvfs->z_ctldir = vp;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Destroy the '.zfs' directory. Only called when the filesystem is unmounted.
|
|
|
|
* There might still be more references if we were force unmounted, but only
|
|
|
|
* new zfs_inactive() calls can occur and they don't reference .zfs
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
zfsctl_destroy(zfsvfs_t *zfsvfs)
|
|
|
|
{
|
|
|
|
VN_RELE(zfsvfs->z_ctldir);
|
|
|
|
zfsvfs->z_ctldir = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Given a root znode, retrieve the associated .zfs directory.
|
|
|
|
* Add a hold to the vnode and return it.
|
|
|
|
*/
|
|
|
|
vnode_t *
|
|
|
|
zfsctl_root(znode_t *zp)
|
|
|
|
{
|
|
|
|
ASSERT(zfs_has_ctldir(zp));
|
|
|
|
VN_HOLD(zp->z_zfsvfs->z_ctldir);
|
|
|
|
return (zp->z_zfsvfs->z_ctldir);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Common open routine. Disallow any write access.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_common_open(vnode_t **vpp, int flags, cred_t *cr, caller_context_t *ct)
|
|
|
|
{
|
|
|
|
if (flags & FWRITE)
|
|
|
|
return (EACCES);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Common close routine. Nothing to do here.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_common_close(vnode_t *vpp, int flags, int count, offset_t off,
|
|
|
|
cred_t *cr, caller_context_t *ct)
|
|
|
|
{
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Common access routine. Disallow writes.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_common_access(vnode_t *vp, int mode, int flags, cred_t *cr,
|
|
|
|
caller_context_t *ct)
|
|
|
|
{
|
2009-02-18 20:51:31 +00:00
|
|
|
if (flags & V_ACE_MASK) {
|
|
|
|
if (mode & ACE_ALL_WRITE_PERMS)
|
|
|
|
return (EACCES);
|
|
|
|
} else {
|
|
|
|
if (mode & VWRITE)
|
|
|
|
return (EACCES);
|
|
|
|
}
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Common getattr function. Fill in basic information.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
zfsctl_common_getattr(vnode_t *vp, vattr_t *vap)
|
|
|
|
{
|
|
|
|
timestruc_t now;
|
|
|
|
|
|
|
|
vap->va_uid = 0;
|
|
|
|
vap->va_gid = 0;
|
|
|
|
vap->va_rdev = 0;
|
|
|
|
/*
|
2010-05-28 20:45:14 +00:00
|
|
|
* We are a purely virtual object, so we have no
|
2008-11-20 20:01:55 +00:00
|
|
|
* blocksize or allocated blocks.
|
|
|
|
*/
|
|
|
|
vap->va_blksize = 0;
|
|
|
|
vap->va_nblocks = 0;
|
|
|
|
vap->va_seq = 0;
|
|
|
|
vap->va_fsid = vp->v_vfsp->vfs_dev;
|
|
|
|
vap->va_mode = S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP |
|
|
|
|
S_IROTH | S_IXOTH;
|
|
|
|
vap->va_type = VDIR;
|
|
|
|
/*
|
|
|
|
* We live in the now (for atime).
|
|
|
|
*/
|
|
|
|
gethrestime(&now);
|
|
|
|
vap->va_atime = now;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*ARGSUSED*/
|
|
|
|
static int
|
|
|
|
zfsctl_common_fid(vnode_t *vp, fid_t *fidp, caller_context_t *ct)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
|
|
|
|
zfsctl_node_t *zcp = vp->v_data;
|
|
|
|
uint64_t object = zcp->zc_id;
|
|
|
|
zfid_short_t *zfid;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
if (fidp->fid_len < SHORT_FID_LEN) {
|
|
|
|
fidp->fid_len = SHORT_FID_LEN;
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (ENOSPC);
|
|
|
|
}
|
|
|
|
|
|
|
|
zfid = (zfid_short_t *)fidp;
|
|
|
|
|
|
|
|
zfid->zf_len = SHORT_FID_LEN;
|
|
|
|
|
|
|
|
for (i = 0; i < sizeof (zfid->zf_object); i++)
|
|
|
|
zfid->zf_object[i] = (uint8_t)(object >> (8 * i));
|
|
|
|
|
|
|
|
/* .zfs znodes always have a generation number of 0 */
|
|
|
|
for (i = 0; i < sizeof (zfid->zf_gen); i++)
|
|
|
|
zfid->zf_gen[i] = 0;
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
|
|
|
|
/*ARGSUSED*/
|
|
|
|
static int
|
|
|
|
zfsctl_shares_fid(vnode_t *vp, fid_t *fidp, caller_context_t *ct)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
|
|
|
|
znode_t *dzp;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
if (zfsvfs->z_shares_dir == 0) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (ENOTSUP);
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((error = zfs_zget(zfsvfs, zfsvfs->z_shares_dir, &dzp)) == 0) {
|
|
|
|
error = VOP_FID(ZTOV(dzp), fidp, ct);
|
|
|
|
VN_RELE(ZTOV(dzp));
|
|
|
|
}
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (error);
|
|
|
|
}
|
2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* .zfs inode namespace
|
|
|
|
*
|
|
|
|
* We need to generate unique inode numbers for all files and directories
|
|
|
|
* within the .zfs pseudo-filesystem. We use the following scheme:
|
|
|
|
*
|
|
|
|
* ENTRY ZFSCTL_INODE
|
|
|
|
* .zfs 1
|
|
|
|
* .zfs/snapshot 2
|
|
|
|
* .zfs/snapshot/<snap> objectid(snap)
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define ZFSCTL_INO_SNAP(id) (id)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Get root directory attributes.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_root_getattr(vnode_t *vp, vattr_t *vap, int flags, cred_t *cr,
|
|
|
|
caller_context_t *ct)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
|
2010-05-28 20:45:14 +00:00
|
|
|
zfsctl_node_t *zcp = vp->v_data;
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
vap->va_nodeid = ZFSCTL_INO_ROOT;
|
|
|
|
vap->va_nlink = vap->va_size = NROOT_ENTRIES;
|
2010-05-28 20:45:14 +00:00
|
|
|
vap->va_mtime = vap->va_ctime = zcp->zc_cmtime;
|
2008-11-20 20:01:55 +00:00
|
|
|
|
|
|
|
zfsctl_common_getattr(vp, vap);
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Special case the handling of "..".
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
|
|
|
int
|
|
|
|
zfsctl_root_lookup(vnode_t *dvp, char *nm, vnode_t **vpp, pathname_t *pnp,
|
|
|
|
int flags, vnode_t *rdir, cred_t *cr, caller_context_t *ct,
|
|
|
|
int *direntflags, pathname_t *realpnp)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = dvp->v_vfsp->vfs_data;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* No extended attributes allowed under .zfs
|
|
|
|
*/
|
|
|
|
if (flags & LOOKUP_XATTR)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
if (strcmp(nm, "..") == 0) {
|
|
|
|
err = VFS_ROOT(dvp->v_vfsp, vpp);
|
|
|
|
} else {
|
|
|
|
err = gfs_vop_lookup(dvp, nm, vpp, pnp, flags, rdir,
|
|
|
|
cr, ct, direntflags, realpnp);
|
|
|
|
}
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2009-02-18 20:51:31 +00:00
|
|
|
static int
|
|
|
|
zfsctl_pathconf(vnode_t *vp, int cmd, ulong_t *valp, cred_t *cr,
|
|
|
|
caller_context_t *ct)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We only care about ACL_ENABLED so that libsec can
|
|
|
|
* display ACL correctly and not default to POSIX draft.
|
|
|
|
*/
|
|
|
|
if (cmd == _PC_ACL_ENABLED) {
|
|
|
|
*valp = _ACL_ACE_ENABLED;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (fs_pathconf(vp, cmd, valp, cr, ct));
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
static const fs_operation_def_t zfsctl_tops_root[] = {
|
|
|
|
{ VOPNAME_OPEN, { .vop_open = zfsctl_common_open } },
|
|
|
|
{ VOPNAME_CLOSE, { .vop_close = zfsctl_common_close } },
|
|
|
|
{ VOPNAME_IOCTL, { .error = fs_inval } },
|
|
|
|
{ VOPNAME_GETATTR, { .vop_getattr = zfsctl_root_getattr } },
|
|
|
|
{ VOPNAME_ACCESS, { .vop_access = zfsctl_common_access } },
|
|
|
|
{ VOPNAME_READDIR, { .vop_readdir = gfs_vop_readdir } },
|
|
|
|
{ VOPNAME_LOOKUP, { .vop_lookup = zfsctl_root_lookup } },
|
|
|
|
{ VOPNAME_SEEK, { .vop_seek = fs_seek } },
|
|
|
|
{ VOPNAME_INACTIVE, { .vop_inactive = gfs_vop_inactive } },
|
2009-02-18 20:51:31 +00:00
|
|
|
{ VOPNAME_PATHCONF, { .vop_pathconf = zfsctl_pathconf } },
|
2008-11-20 20:01:55 +00:00
|
|
|
{ VOPNAME_FID, { .vop_fid = zfsctl_common_fid } },
|
|
|
|
{ NULL }
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfsctl_snapshot_zname(vnode_t *vp, const char *name, int len, char *zname)
|
|
|
|
{
|
|
|
|
objset_t *os = ((zfsvfs_t *)((vp)->v_vfsp->vfs_data))->z_os;
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
if (snapshot_namecheck(name, NULL, NULL) != 0)
|
|
|
|
return (EILSEQ);
|
2008-11-20 20:01:55 +00:00
|
|
|
dmu_objset_name(os, zname);
|
|
|
|
if (strlen(zname) + 1 + strlen(name) >= len)
|
|
|
|
return (ENAMETOOLONG);
|
|
|
|
(void) strcat(zname, "@");
|
|
|
|
(void) strcat(zname, name);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfsctl_unmount_snap(zfs_snapentry_t *sep, int fflags, cred_t *cr)
|
|
|
|
{
|
|
|
|
vnode_t *svp = sep->se_root;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ASSERT(vn_ismntpt(svp));
|
|
|
|
|
|
|
|
/* this will be dropped by dounmount() */
|
|
|
|
if ((error = vn_vfswlock(svp)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
VN_HOLD(svp);
|
|
|
|
error = dounmount(vn_mountedvfs(svp), fflags, cr);
|
|
|
|
if (error) {
|
|
|
|
VN_RELE(svp);
|
|
|
|
return (error);
|
|
|
|
}
|
2009-07-02 22:44:48 +00:00
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* We can't use VN_RELE(), as that will try to invoke
|
|
|
|
* zfsctl_snapdir_inactive(), which would cause us to destroy
|
|
|
|
* the sd_lock mutex held by our caller.
|
|
|
|
*/
|
|
|
|
ASSERT(svp->v_count == 1);
|
|
|
|
gfs_vop_inactive(svp, cr, NULL);
|
|
|
|
|
|
|
|
kmem_free(sep->se_name, strlen(sep->se_name) + 1);
|
|
|
|
kmem_free(sep, sizeof (zfs_snapentry_t));
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfsctl_rename_snap(zfsctl_snapdir_t *sdp, zfs_snapentry_t *sep, const char *nm)
|
|
|
|
{
|
|
|
|
avl_index_t where;
|
|
|
|
vfs_t *vfsp;
|
|
|
|
refstr_t *pathref;
|
|
|
|
char newpath[MAXNAMELEN];
|
|
|
|
char *tail;
|
|
|
|
|
|
|
|
ASSERT(MUTEX_HELD(&sdp->sd_lock));
|
|
|
|
ASSERT(sep != NULL);
|
|
|
|
|
|
|
|
vfsp = vn_mountedvfs(sep->se_root);
|
|
|
|
ASSERT(vfsp != NULL);
|
|
|
|
|
|
|
|
vfs_lock_wait(vfsp);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Change the name in the AVL tree.
|
|
|
|
*/
|
|
|
|
avl_remove(&sdp->sd_snaps, sep);
|
|
|
|
kmem_free(sep->se_name, strlen(sep->se_name) + 1);
|
|
|
|
sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
|
|
|
|
(void) strcpy(sep->se_name, nm);
|
|
|
|
VERIFY(avl_find(&sdp->sd_snaps, sep, &where) == NULL);
|
|
|
|
avl_insert(&sdp->sd_snaps, sep, where);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Change the current mountpoint info:
|
|
|
|
* - update the tail of the mntpoint path
|
|
|
|
* - update the tail of the resource path
|
|
|
|
*/
|
|
|
|
pathref = vfs_getmntpoint(vfsp);
|
|
|
|
(void) strncpy(newpath, refstr_value(pathref), sizeof (newpath));
|
|
|
|
VERIFY((tail = strrchr(newpath, '/')) != NULL);
|
|
|
|
*(tail+1) = '\0';
|
|
|
|
ASSERT3U(strlen(newpath) + strlen(nm), <, sizeof (newpath));
|
|
|
|
(void) strcat(newpath, nm);
|
|
|
|
refstr_rele(pathref);
|
|
|
|
vfs_setmntpoint(vfsp, newpath);
|
|
|
|
|
|
|
|
pathref = vfs_getresource(vfsp);
|
|
|
|
(void) strncpy(newpath, refstr_value(pathref), sizeof (newpath));
|
|
|
|
VERIFY((tail = strrchr(newpath, '@')) != NULL);
|
|
|
|
*(tail+1) = '\0';
|
|
|
|
ASSERT3U(strlen(newpath) + strlen(nm), <, sizeof (newpath));
|
|
|
|
(void) strcat(newpath, nm);
|
|
|
|
refstr_rele(pathref);
|
|
|
|
vfs_setresource(vfsp, newpath);
|
|
|
|
|
|
|
|
vfs_unlock(vfsp);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*ARGSUSED*/
|
|
|
|
static int
|
|
|
|
zfsctl_snapdir_rename(vnode_t *sdvp, char *snm, vnode_t *tdvp, char *tnm,
|
|
|
|
cred_t *cr, caller_context_t *ct, int flags)
|
|
|
|
{
|
|
|
|
zfsctl_snapdir_t *sdp = sdvp->v_data;
|
|
|
|
zfs_snapentry_t search, *sep;
|
|
|
|
zfsvfs_t *zfsvfs;
|
|
|
|
avl_index_t where;
|
|
|
|
char from[MAXNAMELEN], to[MAXNAMELEN];
|
|
|
|
char real[MAXNAMELEN];
|
|
|
|
int err;
|
|
|
|
|
|
|
|
zfsvfs = sdvp->v_vfsp->vfs_data;
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
if ((flags & FIGNORECASE) || zfsvfs->z_case == ZFS_CASE_INSENSITIVE) {
|
|
|
|
err = dmu_snapshot_realname(zfsvfs->z_os, snm, real,
|
|
|
|
MAXNAMELEN, NULL);
|
|
|
|
if (err == 0) {
|
|
|
|
snm = real;
|
|
|
|
} else if (err != ENOTSUP) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
err = zfsctl_snapshot_zname(sdvp, snm, MAXNAMELEN, from);
|
|
|
|
if (!err)
|
|
|
|
err = zfsctl_snapshot_zname(tdvp, tnm, MAXNAMELEN, to);
|
|
|
|
if (!err)
|
|
|
|
err = zfs_secpolicy_rename_perms(from, to, cr);
|
|
|
|
if (err)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Cannot move snapshots out of the snapdir.
|
|
|
|
*/
|
|
|
|
if (sdvp != tdvp)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
if (strcmp(snm, tnm) == 0)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
mutex_enter(&sdp->sd_lock);
|
|
|
|
|
|
|
|
search.se_name = (char *)snm;
|
|
|
|
if ((sep = avl_find(&sdp->sd_snaps, &search, &where)) == NULL) {
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
return (ENOENT);
|
|
|
|
}
|
|
|
|
|
|
|
|
err = dmu_objset_rename(from, to, B_FALSE);
|
|
|
|
if (err == 0)
|
|
|
|
zfsctl_rename_snap(sdp, sep, tnm);
|
|
|
|
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_snapdir_remove(vnode_t *dvp, char *name, vnode_t *cwd, cred_t *cr,
|
|
|
|
caller_context_t *ct, int flags)
|
|
|
|
{
|
|
|
|
zfsctl_snapdir_t *sdp = dvp->v_data;
|
|
|
|
zfs_snapentry_t *sep;
|
|
|
|
zfs_snapentry_t search;
|
|
|
|
zfsvfs_t *zfsvfs;
|
|
|
|
char snapname[MAXNAMELEN];
|
|
|
|
char real[MAXNAMELEN];
|
|
|
|
int err;
|
|
|
|
|
|
|
|
zfsvfs = dvp->v_vfsp->vfs_data;
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
if ((flags & FIGNORECASE) || zfsvfs->z_case == ZFS_CASE_INSENSITIVE) {
|
|
|
|
|
|
|
|
err = dmu_snapshot_realname(zfsvfs->z_os, name, real,
|
|
|
|
MAXNAMELEN, NULL);
|
|
|
|
if (err == 0) {
|
|
|
|
name = real;
|
|
|
|
} else if (err != ENOTSUP) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
err = zfsctl_snapshot_zname(dvp, name, MAXNAMELEN, snapname);
|
|
|
|
if (!err)
|
|
|
|
err = zfs_secpolicy_destroy_perms(snapname, cr);
|
|
|
|
if (err)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
mutex_enter(&sdp->sd_lock);
|
|
|
|
|
|
|
|
search.se_name = name;
|
|
|
|
sep = avl_find(&sdp->sd_snaps, &search, NULL);
|
|
|
|
if (sep) {
|
|
|
|
avl_remove(&sdp->sd_snaps, sep);
|
|
|
|
err = zfsctl_unmount_snap(sep, MS_FORCE, cr);
|
|
|
|
if (err)
|
|
|
|
avl_add(&sdp->sd_snaps, sep);
|
|
|
|
else
|
2009-08-18 18:43:27 +00:00
|
|
|
err = dmu_objset_destroy(snapname, B_FALSE);
|
2008-11-20 20:01:55 +00:00
|
|
|
} else {
|
|
|
|
err = ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This creates a snapshot under '.zfs/snapshot'.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_snapdir_mkdir(vnode_t *dvp, char *dirname, vattr_t *vap, vnode_t **vpp,
|
|
|
|
cred_t *cr, caller_context_t *cc, int flags, vsecattr_t *vsecp)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = dvp->v_vfsp->vfs_data;
|
|
|
|
char name[MAXNAMELEN];
|
|
|
|
int err;
|
|
|
|
static enum symfollow follow = NO_FOLLOW;
|
|
|
|
static enum uio_seg seg = UIO_SYSSPACE;
|
|
|
|
|
2008-12-03 20:09:06 +00:00
|
|
|
if (snapshot_namecheck(dirname, NULL, NULL) != 0)
|
|
|
|
return (EILSEQ);
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
dmu_objset_name(zfsvfs->z_os, name);
|
|
|
|
|
|
|
|
*vpp = NULL;
|
|
|
|
|
|
|
|
err = zfs_secpolicy_snapshot_perms(name, cr);
|
|
|
|
if (err)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
if (err == 0) {
|
2009-07-02 22:44:48 +00:00
|
|
|
err = dmu_objset_snapshot(name, dirname, NULL, B_FALSE);
|
2008-11-20 20:01:55 +00:00
|
|
|
if (err)
|
|
|
|
return (err);
|
|
|
|
err = lookupnameat(dirname, seg, follow, NULL, vpp, dvp);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Lookup entry point for the 'snapshot' directory. Try to open the
|
|
|
|
* snapshot if it exist, creating the pseudo filesystem vnode as necessary.
|
|
|
|
* Perform a mount of the associated dataset on top of the vnode.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_snapdir_lookup(vnode_t *dvp, char *nm, vnode_t **vpp, pathname_t *pnp,
|
|
|
|
int flags, vnode_t *rdir, cred_t *cr, caller_context_t *ct,
|
|
|
|
int *direntflags, pathname_t *realpnp)
|
|
|
|
{
|
|
|
|
zfsctl_snapdir_t *sdp = dvp->v_data;
|
|
|
|
objset_t *snap;
|
|
|
|
char snapname[MAXNAMELEN];
|
|
|
|
char real[MAXNAMELEN];
|
|
|
|
char *mountpoint;
|
|
|
|
zfs_snapentry_t *sep, search;
|
|
|
|
struct mounta margs;
|
|
|
|
vfs_t *vfsp;
|
|
|
|
size_t mountpoint_len;
|
|
|
|
avl_index_t where;
|
|
|
|
zfsvfs_t *zfsvfs = dvp->v_vfsp->vfs_data;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* No extended attributes allowed under .zfs
|
|
|
|
*/
|
|
|
|
if (flags & LOOKUP_XATTR)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
ASSERT(dvp->v_type == VDIR);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we get a recursive call, that means we got called
|
|
|
|
* from the domount() code while it was trying to look up the
|
|
|
|
* spec (which looks like a local path for zfs). We need to
|
|
|
|
* add some flag to domount() to tell it not to do this lookup.
|
|
|
|
*/
|
|
|
|
if (MUTEX_HELD(&sdp->sd_lock))
|
|
|
|
return (ENOENT);
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
if (gfs_lookup_dot(vpp, dvp, zfsvfs->z_ctldir, nm) == 0) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
if (flags & FIGNORECASE) {
|
|
|
|
boolean_t conflict = B_FALSE;
|
|
|
|
|
|
|
|
err = dmu_snapshot_realname(zfsvfs->z_os, nm, real,
|
|
|
|
MAXNAMELEN, &conflict);
|
|
|
|
if (err == 0) {
|
|
|
|
nm = real;
|
|
|
|
} else if (err != ENOTSUP) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
if (realpnp)
|
|
|
|
(void) strlcpy(realpnp->pn_buf, nm,
|
|
|
|
realpnp->pn_bufsize);
|
|
|
|
if (conflict && direntflags)
|
|
|
|
*direntflags = ED_CASE_CONFLICT;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_enter(&sdp->sd_lock);
|
|
|
|
search.se_name = (char *)nm;
|
|
|
|
if ((sep = avl_find(&sdp->sd_snaps, &search, &where)) != NULL) {
|
|
|
|
*vpp = sep->se_root;
|
|
|
|
VN_HOLD(*vpp);
|
|
|
|
err = traverse(vpp);
|
|
|
|
if (err) {
|
|
|
|
VN_RELE(*vpp);
|
|
|
|
*vpp = NULL;
|
|
|
|
} else if (*vpp == sep->se_root) {
|
|
|
|
/*
|
|
|
|
* The snapshot was unmounted behind our backs,
|
|
|
|
* try to remount it.
|
|
|
|
*/
|
|
|
|
goto domount;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* VROOT was set during the traverse call. We need
|
|
|
|
* to clear it since we're pretending to be part
|
|
|
|
* of our parent's vfs.
|
|
|
|
*/
|
|
|
|
(*vpp)->v_flag &= ~VROOT;
|
|
|
|
}
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The requested snapshot is not currently mounted, look it up.
|
|
|
|
*/
|
|
|
|
err = zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, snapname);
|
|
|
|
if (err) {
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
ZFS_EXIT(zfsvfs);
|
2008-12-03 20:09:06 +00:00
|
|
|
/*
|
|
|
|
* handle "ls *" or "?" in a graceful manner,
|
|
|
|
* forcing EILSEQ to ENOENT.
|
|
|
|
* Since shell ultimately passes "*" or "?" as name to lookup
|
|
|
|
*/
|
|
|
|
return (err == EILSEQ ? ENOENT : err);
|
2008-11-20 20:01:55 +00:00
|
|
|
}
|
2010-05-28 20:45:14 +00:00
|
|
|
if (dmu_objset_hold(snapname, FTAG, &snap) != 0) {
|
2008-11-20 20:01:55 +00:00
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (ENOENT);
|
|
|
|
}
|
|
|
|
|
|
|
|
sep = kmem_alloc(sizeof (zfs_snapentry_t), KM_SLEEP);
|
|
|
|
sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
|
|
|
|
(void) strcpy(sep->se_name, nm);
|
|
|
|
*vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, dmu_objset_id(snap));
|
|
|
|
avl_insert(&sdp->sd_snaps, sep, where);
|
|
|
|
|
2010-05-28 20:45:14 +00:00
|
|
|
dmu_objset_rele(snap, FTAG);
|
2008-11-20 20:01:55 +00:00
|
|
|
domount:
|
|
|
|
mountpoint_len = strlen(refstr_value(dvp->v_vfsp->vfs_mntpt)) +
|
|
|
|
strlen("/.zfs/snapshot/") + strlen(nm) + 1;
|
|
|
|
mountpoint = kmem_alloc(mountpoint_len, KM_SLEEP);
|
|
|
|
(void) snprintf(mountpoint, mountpoint_len, "%s/.zfs/snapshot/%s",
|
|
|
|
refstr_value(dvp->v_vfsp->vfs_mntpt), nm);
|
|
|
|
|
|
|
|
margs.spec = snapname;
|
|
|
|
margs.dir = mountpoint;
|
|
|
|
margs.flags = MS_SYSSPACE | MS_NOMNTTAB;
|
|
|
|
margs.fstype = "zfs";
|
|
|
|
margs.dataptr = NULL;
|
|
|
|
margs.datalen = 0;
|
|
|
|
margs.optptr = NULL;
|
|
|
|
margs.optlen = 0;
|
|
|
|
|
|
|
|
err = domount("zfs", &margs, *vpp, kcred, &vfsp);
|
|
|
|
kmem_free(mountpoint, mountpoint_len);
|
|
|
|
|
|
|
|
if (err == 0) {
|
|
|
|
/*
|
|
|
|
* Return the mounted root rather than the covered mount point.
|
|
|
|
* Takes the GFS vnode at .zfs/snapshot/<snapname> and returns
|
|
|
|
* the ZFS vnode mounted on top of the GFS node. This ZFS
|
2009-07-02 22:44:48 +00:00
|
|
|
* vnode is the root of the newly created vfsp.
|
2008-11-20 20:01:55 +00:00
|
|
|
*/
|
|
|
|
VFS_RELE(vfsp);
|
|
|
|
err = traverse(vpp);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (err == 0) {
|
|
|
|
/*
|
|
|
|
* Fix up the root vnode mounted on .zfs/snapshot/<snapname>.
|
|
|
|
*
|
|
|
|
* This is where we lie about our v_vfsp in order to
|
|
|
|
* make .zfs/snapshot/<snapname> accessible over NFS
|
|
|
|
* without requiring manual mounts of <snapname>.
|
|
|
|
*/
|
|
|
|
ASSERT(VTOZ(*vpp)->z_zfsvfs != zfsvfs);
|
|
|
|
VTOZ(*vpp)->z_zfsvfs->z_parent = zfsvfs;
|
|
|
|
(*vpp)->v_vfsp = zfsvfs->z_vfs;
|
|
|
|
(*vpp)->v_flag &= ~VROOT;
|
|
|
|
}
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we had an error, drop our hold on the vnode and
|
|
|
|
* zfsctl_snapshot_inactive() will clean up.
|
|
|
|
*/
|
|
|
|
if (err) {
|
|
|
|
VN_RELE(*vpp);
|
|
|
|
*vpp = NULL;
|
|
|
|
}
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_shares_lookup(vnode_t *dvp, char *nm, vnode_t **vpp, pathname_t *pnp,
|
|
|
|
int flags, vnode_t *rdir, cred_t *cr, caller_context_t *ct,
|
|
|
|
int *direntflags, pathname_t *realpnp)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = dvp->v_vfsp->vfs_data;
|
|
|
|
znode_t *dzp;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
if (gfs_lookup_dot(vpp, dvp, zfsvfs->z_ctldir, nm) == 0) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (zfsvfs->z_shares_dir == 0) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (ENOTSUP);
|
|
|
|
}
|
|
|
|
if ((error = zfs_zget(zfsvfs, zfsvfs->z_shares_dir, &dzp)) == 0)
|
|
|
|
error = VOP_LOOKUP(ZTOV(dzp), nm, vpp, pnp,
|
|
|
|
flags, rdir, cr, ct, direntflags, realpnp);
|
|
|
|
|
|
|
|
VN_RELE(ZTOV(dzp));
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_snapdir_readdir_cb(vnode_t *vp, void *dp, int *eofp,
|
|
|
|
offset_t *offp, offset_t *nextp, void *data, int flags)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
|
|
|
|
char snapname[MAXNAMELEN];
|
|
|
|
uint64_t id, cookie;
|
|
|
|
boolean_t case_conflict;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
cookie = *offp;
|
|
|
|
error = dmu_snapshot_list_next(zfsvfs->z_os, MAXNAMELEN, snapname, &id,
|
|
|
|
&cookie, &case_conflict);
|
|
|
|
if (error) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
if (error == ENOENT) {
|
|
|
|
*eofp = 1;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (flags & V_RDDIR_ENTFLAGS) {
|
|
|
|
edirent_t *eodp = dp;
|
|
|
|
|
|
|
|
(void) strcpy(eodp->ed_name, snapname);
|
|
|
|
eodp->ed_ino = ZFSCTL_INO_SNAP(id);
|
|
|
|
eodp->ed_eflags = case_conflict ? ED_CASE_CONFLICT : 0;
|
|
|
|
} else {
|
|
|
|
struct dirent64 *odp = dp;
|
|
|
|
|
|
|
|
(void) strcpy(odp->d_name, snapname);
|
|
|
|
odp->d_ino = ZFSCTL_INO_SNAP(id);
|
|
|
|
}
|
|
|
|
*nextp = cookie;
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_shares_readdir(vnode_t *vp, uio_t *uiop, cred_t *cr, int *eofp,
|
|
|
|
caller_context_t *ct, int flags)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
|
|
|
|
znode_t *dzp;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
|
|
|
|
if (zfsvfs->z_shares_dir == 0) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (ENOTSUP);
|
|
|
|
}
|
|
|
|
if ((error = zfs_zget(zfsvfs, zfsvfs->z_shares_dir, &dzp)) == 0) {
|
|
|
|
error = VOP_READDIR(ZTOV(dzp), uiop, cr, eofp, ct, flags);
|
|
|
|
VN_RELE(ZTOV(dzp));
|
|
|
|
} else {
|
|
|
|
*eofp = 1;
|
|
|
|
error = ENOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* pvp is the '.zfs' directory (zfsctl_node_t).
|
|
|
|
* Creates vp, which is '.zfs/snapshot' (zfsctl_snapdir_t).
|
|
|
|
*
|
|
|
|
* This function is the callback to create a GFS vnode for '.zfs/snapshot'
|
|
|
|
* when a lookup is performed on .zfs for "snapshot".
|
|
|
|
*/
|
|
|
|
vnode_t *
|
|
|
|
zfsctl_mknode_snapdir(vnode_t *pvp)
|
|
|
|
{
|
|
|
|
vnode_t *vp;
|
|
|
|
zfsctl_snapdir_t *sdp;
|
|
|
|
|
|
|
|
vp = gfs_dir_create(sizeof (zfsctl_snapdir_t), pvp,
|
|
|
|
zfsctl_ops_snapdir, NULL, NULL, MAXNAMELEN,
|
|
|
|
zfsctl_snapdir_readdir_cb, NULL);
|
|
|
|
sdp = vp->v_data;
|
|
|
|
sdp->sd_node.zc_id = ZFSCTL_INO_SNAPDIR;
|
|
|
|
sdp->sd_node.zc_cmtime = ((zfsctl_node_t *)pvp->v_data)->zc_cmtime;
|
|
|
|
mutex_init(&sdp->sd_lock, NULL, MUTEX_DEFAULT, NULL);
|
|
|
|
avl_create(&sdp->sd_snaps, snapentry_compare,
|
|
|
|
sizeof (zfs_snapentry_t), offsetof(zfs_snapentry_t, se_node));
|
|
|
|
return (vp);
|
|
|
|
}
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
vnode_t *
|
|
|
|
zfsctl_mknode_shares(vnode_t *pvp)
|
|
|
|
{
|
|
|
|
vnode_t *vp;
|
|
|
|
zfsctl_node_t *sdp;
|
|
|
|
|
|
|
|
vp = gfs_dir_create(sizeof (zfsctl_node_t), pvp,
|
|
|
|
zfsctl_ops_shares, NULL, NULL, MAXNAMELEN,
|
|
|
|
NULL, NULL);
|
|
|
|
sdp = vp->v_data;
|
|
|
|
sdp->zc_cmtime = ((zfsctl_node_t *)pvp->v_data)->zc_cmtime;
|
|
|
|
return (vp);
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_shares_getattr(vnode_t *vp, vattr_t *vap, int flags, cred_t *cr,
|
|
|
|
caller_context_t *ct)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
|
|
|
|
znode_t *dzp;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
if (zfsvfs->z_shares_dir == 0) {
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (ENOTSUP);
|
|
|
|
}
|
|
|
|
if ((error = zfs_zget(zfsvfs, zfsvfs->z_shares_dir, &dzp)) == 0) {
|
|
|
|
error = VOP_GETATTR(ZTOV(dzp), vap, flags, cr, ct);
|
|
|
|
VN_RELE(ZTOV(dzp));
|
|
|
|
}
|
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
zfsctl_snapdir_getattr(vnode_t *vp, vattr_t *vap, int flags, cred_t *cr,
|
|
|
|
caller_context_t *ct)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
|
|
|
|
zfsctl_snapdir_t *sdp = vp->v_data;
|
|
|
|
|
|
|
|
ZFS_ENTER(zfsvfs);
|
|
|
|
zfsctl_common_getattr(vp, vap);
|
|
|
|
vap->va_nodeid = gfs_file_inode(vp);
|
|
|
|
vap->va_nlink = vap->va_size = avl_numnodes(&sdp->sd_snaps) + 2;
|
2010-05-28 20:45:14 +00:00
|
|
|
vap->va_ctime = vap->va_mtime = dmu_objset_snap_cmtime(zfsvfs->z_os);
|
2008-11-20 20:01:55 +00:00
|
|
|
ZFS_EXIT(zfsvfs);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ARGSUSED */
|
|
|
|
static void
|
|
|
|
zfsctl_snapdir_inactive(vnode_t *vp, cred_t *cr, caller_context_t *ct)
|
|
|
|
{
|
|
|
|
zfsctl_snapdir_t *sdp = vp->v_data;
|
|
|
|
void *private;
|
|
|
|
|
|
|
|
private = gfs_dir_inactive(vp);
|
|
|
|
if (private != NULL) {
|
|
|
|
ASSERT(avl_numnodes(&sdp->sd_snaps) == 0);
|
|
|
|
mutex_destroy(&sdp->sd_lock);
|
|
|
|
avl_destroy(&sdp->sd_snaps);
|
|
|
|
kmem_free(private, sizeof (zfsctl_snapdir_t));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static const fs_operation_def_t zfsctl_tops_snapdir[] = {
|
|
|
|
{ VOPNAME_OPEN, { .vop_open = zfsctl_common_open } },
|
|
|
|
{ VOPNAME_CLOSE, { .vop_close = zfsctl_common_close } },
|
|
|
|
{ VOPNAME_IOCTL, { .error = fs_inval } },
|
|
|
|
{ VOPNAME_GETATTR, { .vop_getattr = zfsctl_snapdir_getattr } },
|
|
|
|
{ VOPNAME_ACCESS, { .vop_access = zfsctl_common_access } },
|
|
|
|
{ VOPNAME_RENAME, { .vop_rename = zfsctl_snapdir_rename } },
|
|
|
|
{ VOPNAME_RMDIR, { .vop_rmdir = zfsctl_snapdir_remove } },
|
|
|
|
{ VOPNAME_MKDIR, { .vop_mkdir = zfsctl_snapdir_mkdir } },
|
|
|
|
{ VOPNAME_READDIR, { .vop_readdir = gfs_vop_readdir } },
|
|
|
|
{ VOPNAME_LOOKUP, { .vop_lookup = zfsctl_snapdir_lookup } },
|
|
|
|
{ VOPNAME_SEEK, { .vop_seek = fs_seek } },
|
|
|
|
{ VOPNAME_INACTIVE, { .vop_inactive = zfsctl_snapdir_inactive } },
|
|
|
|
{ VOPNAME_FID, { .vop_fid = zfsctl_common_fid } },
|
|
|
|
{ NULL }
|
|
|
|
};
|
|
|
|
|
2009-07-02 22:44:48 +00:00
|
|
|
static const fs_operation_def_t zfsctl_tops_shares[] = {
|
|
|
|
{ VOPNAME_OPEN, { .vop_open = zfsctl_common_open } },
|
|
|
|
{ VOPNAME_CLOSE, { .vop_close = zfsctl_common_close } },
|
|
|
|
{ VOPNAME_IOCTL, { .error = fs_inval } },
|
|
|
|
{ VOPNAME_GETATTR, { .vop_getattr = zfsctl_shares_getattr } },
|
|
|
|
{ VOPNAME_ACCESS, { .vop_access = zfsctl_common_access } },
|
|
|
|
{ VOPNAME_READDIR, { .vop_readdir = zfsctl_shares_readdir } },
|
|
|
|
{ VOPNAME_LOOKUP, { .vop_lookup = zfsctl_shares_lookup } },
|
|
|
|
{ VOPNAME_SEEK, { .vop_seek = fs_seek } },
|
|
|
|
{ VOPNAME_INACTIVE, { .vop_inactive = gfs_vop_inactive } },
|
|
|
|
{ VOPNAME_FID, { .vop_fid = zfsctl_shares_fid } },
|
|
|
|
{ NULL }
|
|
|
|
};
|
|
|
|
|
2008-11-20 20:01:55 +00:00
|
|
|
/*
|
|
|
|
* pvp is the GFS vnode '.zfs/snapshot'.
|
|
|
|
*
|
|
|
|
* This creates a GFS node under '.zfs/snapshot' representing each
|
|
|
|
* snapshot. This newly created GFS node is what we mount snapshot
|
|
|
|
* vfs_t's ontop of.
|
|
|
|
*/
|
|
|
|
static vnode_t *
|
|
|
|
zfsctl_snapshot_mknode(vnode_t *pvp, uint64_t objset)
|
|
|
|
{
|
|
|
|
vnode_t *vp;
|
|
|
|
zfsctl_node_t *zcp;
|
|
|
|
|
|
|
|
vp = gfs_dir_create(sizeof (zfsctl_node_t), pvp,
|
|
|
|
zfsctl_ops_snapshot, NULL, NULL, MAXNAMELEN, NULL, NULL);
|
|
|
|
zcp = vp->v_data;
|
|
|
|
zcp->zc_id = objset;
|
|
|
|
|
|
|
|
return (vp);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfsctl_snapshot_inactive(vnode_t *vp, cred_t *cr, caller_context_t *ct)
|
|
|
|
{
|
|
|
|
zfsctl_snapdir_t *sdp;
|
|
|
|
zfs_snapentry_t *sep, *next;
|
|
|
|
vnode_t *dvp;
|
|
|
|
|
|
|
|
VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
|
|
|
|
sdp = dvp->v_data;
|
|
|
|
|
|
|
|
mutex_enter(&sdp->sd_lock);
|
|
|
|
|
|
|
|
if (vp->v_count > 1) {
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
ASSERT(!vn_ismntpt(vp));
|
|
|
|
|
|
|
|
sep = avl_first(&sdp->sd_snaps);
|
|
|
|
while (sep != NULL) {
|
|
|
|
next = AVL_NEXT(&sdp->sd_snaps, sep);
|
|
|
|
|
|
|
|
if (sep->se_root == vp) {
|
|
|
|
avl_remove(&sdp->sd_snaps, sep);
|
|
|
|
kmem_free(sep->se_name, strlen(sep->se_name) + 1);
|
|
|
|
kmem_free(sep, sizeof (zfs_snapentry_t));
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
sep = next;
|
|
|
|
}
|
|
|
|
ASSERT(sep != NULL);
|
|
|
|
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
VN_RELE(dvp);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Dispose of the vnode for the snapshot mount point.
|
|
|
|
* This is safe to do because once this entry has been removed
|
|
|
|
* from the AVL tree, it can't be found again, so cannot become
|
|
|
|
* "active". If we lookup the same name again we will end up
|
|
|
|
* creating a new vnode.
|
|
|
|
*/
|
|
|
|
gfs_vop_inactive(vp, cr, ct);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* These VP's should never see the light of day. They should always
|
|
|
|
* be covered.
|
|
|
|
*/
|
|
|
|
static const fs_operation_def_t zfsctl_tops_snapshot[] = {
|
|
|
|
VOPNAME_INACTIVE, { .vop_inactive = zfsctl_snapshot_inactive },
|
|
|
|
NULL, NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
int
|
|
|
|
zfsctl_lookup_objset(vfs_t *vfsp, uint64_t objsetid, zfsvfs_t **zfsvfsp)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vfsp->vfs_data;
|
|
|
|
vnode_t *dvp, *vp;
|
|
|
|
zfsctl_snapdir_t *sdp;
|
|
|
|
zfsctl_node_t *zcp;
|
|
|
|
zfs_snapentry_t *sep;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ASSERT(zfsvfs->z_ctldir != NULL);
|
|
|
|
error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
|
|
|
|
NULL, 0, NULL, kcred, NULL, NULL, NULL);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
sdp = dvp->v_data;
|
|
|
|
|
|
|
|
mutex_enter(&sdp->sd_lock);
|
|
|
|
sep = avl_first(&sdp->sd_snaps);
|
|
|
|
while (sep != NULL) {
|
|
|
|
vp = sep->se_root;
|
|
|
|
zcp = vp->v_data;
|
|
|
|
if (zcp->zc_id == objsetid)
|
|
|
|
break;
|
|
|
|
|
|
|
|
sep = AVL_NEXT(&sdp->sd_snaps, sep);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sep != NULL) {
|
|
|
|
VN_HOLD(vp);
|
|
|
|
/*
|
|
|
|
* Return the mounted root rather than the covered mount point.
|
|
|
|
* Takes the GFS vnode at .zfs/snapshot/<snapshot objsetid>
|
|
|
|
* and returns the ZFS vnode mounted on top of the GFS node.
|
|
|
|
* This ZFS vnode is the root of the vfs for objset 'objsetid'.
|
|
|
|
*/
|
|
|
|
error = traverse(&vp);
|
|
|
|
if (error == 0) {
|
|
|
|
if (vp == sep->se_root)
|
|
|
|
error = EINVAL;
|
|
|
|
else
|
|
|
|
*zfsvfsp = VTOZ(vp)->z_zfsvfs;
|
|
|
|
}
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
VN_RELE(vp);
|
|
|
|
} else {
|
|
|
|
error = EINVAL;
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
VN_RELE(dvp);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unmount any snapshots for the given filesystem. This is called from
|
|
|
|
* zfs_umount() - if we have a ctldir, then go through and unmount all the
|
|
|
|
* snapshots.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
zfsctl_umount_snapshots(vfs_t *vfsp, int fflags, cred_t *cr)
|
|
|
|
{
|
|
|
|
zfsvfs_t *zfsvfs = vfsp->vfs_data;
|
|
|
|
vnode_t *dvp;
|
|
|
|
zfsctl_snapdir_t *sdp;
|
|
|
|
zfs_snapentry_t *sep, *next;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
ASSERT(zfsvfs->z_ctldir != NULL);
|
|
|
|
error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
|
|
|
|
NULL, 0, NULL, cr, NULL, NULL, NULL);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
sdp = dvp->v_data;
|
|
|
|
|
|
|
|
mutex_enter(&sdp->sd_lock);
|
|
|
|
|
|
|
|
sep = avl_first(&sdp->sd_snaps);
|
|
|
|
while (sep != NULL) {
|
|
|
|
next = AVL_NEXT(&sdp->sd_snaps, sep);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If this snapshot is not mounted, then it must
|
|
|
|
* have just been unmounted by somebody else, and
|
|
|
|
* will be cleaned up by zfsctl_snapdir_inactive().
|
|
|
|
*/
|
|
|
|
if (vn_ismntpt(sep->se_root)) {
|
|
|
|
avl_remove(&sdp->sd_snaps, sep);
|
|
|
|
error = zfsctl_unmount_snap(sep, fflags, cr);
|
|
|
|
if (error) {
|
|
|
|
avl_add(&sdp->sd_snaps, sep);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
sep = next;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_exit(&sdp->sd_lock);
|
|
|
|
VN_RELE(dvp);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
Linux ZVOL implementation; kernel-side changes
At last a useful user space interface for the Linux ZFS port arrives.
With the addition of the ZVOL real ZFS based block devices are available
and can be compared head to head with Linux's MD and LVM block drivers.
The Linux ZVOL has not yet had any performance work done but from a user
perspective it should be functionally complete and behave like any other
Linux block device.
The ZVOL has so far been tested using zconfig.sh on the following x86_64
based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more
testing is required to ensure everything is working as designed.
What follows in a somewhat detailed list of changes includes in this
commit to make ZVOL's possible. A few other issues were addressed in
the context of these changes which will also be mentioned.
* Added module/zfs/zvol.c which is based off the original Solaris ZVOL
implementation but rewritten to intergrate with the Linux block device
APIs. The basic design remains the similar in Linux with the major
change being request processing. Request processing is handled by
registering a request function which the elevator calls once all request
merges is finished and the elevator unplugs. This function is called
under a spin lock and the request structure is passed to the block driver
to be queued for IO. The elevator must be notified asyncronously once
the request completes or fails with an error. This allows us the block
driver a chance to handle many request concurrently. For the ZVOL we
maintain a taskq with a service thread per core. As requests are delivered
by the elevator each request is dispatched to the taskq. The task queue
handles each request with a write or read helper function which basically
copies the request data in to our out of the DMU object. Writes single
completion as soon as the DMU has the data unless they are marked sync.
Reads are all handled syncronously however the elevator will merge many
small reads in to a large read before it submitting the request.
* Cachine is worth specifically mentioning. Because both the Linux VFS
and the ZFS ARC both want to fully manage the cache we unfortunately
end up with two caches. This means our memory foot print is larger
than otherwise expected, and it means we have an extra copy between
the caches, but it does not impact correctness. All syncs are barrior
requests I believe are handled correctly. Longer term there is lots of
room for improvement here but it will require fairly extensive changes
to either the Linux VFS and VM layer, or additional DMU interfaces to
handle managing buffer not directly allocated by the ARC.
* Added module/zfs/include/sys/blkdev.h which contains all the Linux
compatibility foo which is required to handle changes in the Linux block
APIs from 2.6.18 thru 2.6.31 based kernels.
* The dmu_{read,write}_uio interfaces which don't make sense on Linux
have been modified to dmu_{read,write}_req functions which consume the
standard Linux IO request structure. Their function fundamentally
remains the same so this happily worked out pretty cleanly.
* The /dev/zfs character device is no longer created through the half
implemented Solaris driver DDI interfaces. It is now simply created
with it's own major number as a Linux misc device which greatly simplifies
everything. It is only capable of handling ioctls() but this fits nicely
because that's all it ever has to do. The ZVOL devices unlike in Solaris
do not leverage the same major number as /dev/zfs but instead register
their own major. Because only one major is allocated and space is reserved
for 16 partitions per-device there is a limit of 16384 concurrent ZVOL
devices. By using multiple majors like the scsi driver this limit could
be addressed if it becomes a problem.
* The {spa,zfs,zvol}_busy() functions have all be removed because they
are not required on a Linux system. Under Linux the registered module
exit function will not be called while the are still references to the
module. Once the exit function is called however it must succeed or
block, it may not fail so returning an error on module unload makes to
sense under Linux.
* With the addition of ZVOL support all the HAVE_ZVOL defines were removed
for obvious reasons. However, the HAVE_ZPL defines have been relocated
in to the linux-{kernel,user}-disk topic branches and must remain until
the ZPL is implemented.
2009-11-20 19:06:59 +00:00
|
|
|
#endif /* HAVE_ZPL */
|