Linux ZVOL implementation; kernel-side changes

At last a useful user space interface for the Linux ZFS port arrives. With the addition of the ZVOL real ZFS based block devices are available and can be compared head to head with Linux's MD and LVM block drivers. The Linux ZVOL has not yet had any performance work done but from a user perspective it should be functionally complete and behave like any other Linux block device. The ZVOL has so far been tested using zconfig.sh on the following x86_64 based platforms: FC11, CHAOS4, RHEL5, RHEL6, and SLES11. However, more testing is required to ensure everything is working as designed. What follows in a somewhat detailed list of changes includes in this commit to make ZVOL's possible. A few other issues were addressed in the context of these changes which will also be mentioned. * Added module/zfs/zvol.c which is based off the original Solaris ZVOL implementation but rewritten to intergrate with the Linux block device APIs. The basic design remains the similar in Linux with the major change being request processing. Request processing is handled by registering a request function which the elevator calls once all request merges is finished and the elevator unplugs. This function is called under a spin lock and the request structure is passed to the block driver to be queued for IO. The elevator must be notified asyncronously once the request completes or fails with an error. This allows us the block driver a chance to handle many request concurrently. For the ZVOL we maintain a taskq with a service thread per core. As requests are delivered by the elevator each request is dispatched to the taskq. The task queue handles each request with a write or read helper function which basically copies the request data in to our out of the DMU object. Writes single completion as soon as the DMU has the data unless they are marked sync. Reads are all handled syncronously however the elevator will merge many small reads in to a large read before it submitting the request. * Cachine is worth specifically mentioning. Because both the Linux VFS and the ZFS ARC both want to fully manage the cache we unfortunately end up with two caches. This means our memory foot print is larger than otherwise expected, and it means we have an extra copy between the caches, but it does not impact correctness. All syncs are barrior requests I believe are handled correctly. Longer term there is lots of room for improvement here but it will require fairly extensive changes to either the Linux VFS and VM layer, or additional DMU interfaces to handle managing buffer not directly allocated by the ARC. * Added module/zfs/include/sys/blkdev.h which contains all the Linux compatibility foo which is required to handle changes in the Linux block APIs from 2.6.18 thru 2.6.31 based kernels. * The dmu_{read,write}_uio interfaces which don't make sense on Linux have been modified to dmu_{read,write}_req functions which consume the standard Linux IO request structure. Their function fundamentally remains the same so this happily worked out pretty cleanly. * The /dev/zfs character device is no longer created through the half implemented Solaris driver DDI interfaces. It is now simply created with it's own major number as a Linux misc device which greatly simplifies everything. It is only capable of handling ioctls() but this fits nicely because that's all it ever has to do. The ZVOL devices unlike in Solaris do not leverage the same major number as /dev/zfs but instead register their own major. Because only one major is allocated and space is reserved for 16 partitions per-device there is a limit of 16384 concurrent ZVOL devices. By using multiple majors like the scsi driver this limit could be addressed if it becomes a problem. * The {spa,zfs,zvol}_busy() functions have all be removed because they are not required on a Linux system. Under Linux the registered module exit function will not be called while the are still references to the module. Once the exit function is called however it must succeed or block, it may not fail so returning an error on module unload makes to sense under Linux. * With the addition of ZVOL support all the HAVE_ZVOL defines were removed for obvious reasons. However, the HAVE_ZPL defines have been relocated in to the linux-{kernel,user}-disk topic branches and must remain until the ZPL is implemented.
2009-11-20 11:06:59 -08:00 · 2009-11-20 11:06:59 -08:00 · fb1b00e9f4
parent 49fdb13bc8
commit fb1b00e9f4
23 changed files with 1702 additions and 234 deletions
--- a/module/zcommon/include/sys/fs/zfs.h
+++ b/module/zcommon/include/sys/fs/zfs.h
@ -536,20 +536,10 @@ typedef struct vdev_stat {
 #define	ZVOL_DRIVER		"zvol"
 #define	ZFS_DRIVER		"zfs"
 #define	ZFS_DEV			"/dev/zfs"
-
-/*
- * zvol paths.  Irritatingly, the devfsadm interfaces want all these
- * paths without the /dev prefix, but for some things, we want the
- * /dev prefix.  Below are the names without /dev.
- */
-#define	ZVOL_DEV_DIR	"zvol/dsk"
-#define	ZVOL_RDEV_DIR	"zvol/rdsk"
-
-/*
- * And here are the things we need with /dev, etc. in front of them.
- */
-#define	ZVOL_PSEUDO_DEV		"/devices/pseudo/zfs@0:"
-#define	ZVOL_FULL_DEV_DIR	"/dev/" ZVOL_DEV_DIR "/"
+#define ZVOL_MAJOR		230
+#define ZVOL_MINOR_BITS		4
+#define ZVOL_MINOR_MASK		((1U << ZVOL_MINOR_BITS) - 1)
+#define ZVOL_MINORS		(1 << 4)

 #define	ZVOL_PROP_NAME		"name"

--- a/module/zfs/dmu.c
+++ b/module/zfs/dmu.c
@ -660,9 +660,58 @@ dmu_prealloc(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
 }

 #ifdef _KERNEL
-int
-dmu_read_uio(objset_t *os, uint64_t object, uio_t *uio, uint64_t size)
+
+/*
+ * Copy up to size bytes between arg_buf and req based on the data direction
+ * described by the req.  If an entire req's data cannot be transfered the
+ * req's is updated such that it's current index and bv offsets correctly
+ * reference any residual data which could not be copied.  The return value
+ * is the number of bytes successfully copied to arg_buf.
+ */
+static int
+dmu_req_copy(void *arg_buf, int size, int *offset, struct request *req)
 {
+	struct bio_vec *bv;
+	struct req_iterator iter;
+	char *bv_buf;
+	int tocpy;
+
+	*offset = 0;
+	rq_for_each_segment(bv, req, iter) {
+
+		/* Fully consumed the passed arg_buf */
+		ASSERT3S(offset, <=, size);
+		if (size == *offset)
+			break;
+
+		/* Skip fully consumed bv's */
+		if (bv->bv_len == 0)
+			continue;
+
+		tocpy = MIN(bv->bv_len, size - *offset);
+		ASSERT3S(tocpy, >=, 0);
+
+		bv_buf = page_address(bv->bv_page) + bv->bv_offset;
+		ASSERT3P(bv_buf, !=, NULL);
+
+		if (rq_data_dir(req) == WRITE)
+			memcpy(arg_buf + *offset, bv_buf, tocpy);
+		else
+			memcpy(bv_buf, arg_buf + *offset, tocpy);
+
+		*offset += tocpy;
+		bv->bv_offset += tocpy;
+		bv->bv_len -= tocpy;
+	}
+
+	return 0;
+}
+
+int
+dmu_read_req(objset_t *os, uint64_t object, struct request *req)
+{
+	uint64_t size = blk_rq_bytes(req);
+	uint64_t offset = blk_rq_pos(req) << 9;
 	dmu_buf_t **dbp;
 	int numbufs, i, err;

@ -670,27 +719,33 @@ dmu_read_uio(objset_t *os, uint64_t object, uio_t *uio, uint64_t size)
 	 * NB: we could do this block-at-a-time, but it's nice
 	 * to be reading in parallel.
 	 */
-	err = dmu_buf_hold_array(os, object, uio->uio_loffset, size, TRUE, FTAG,
+	err = dmu_buf_hold_array(os, object, offset, size, TRUE, FTAG,
 				 &numbufs, &dbp);
 	if (err)
 		return (err);

 	for (i = 0; i < numbufs; i++) {
-		int tocpy;
-		int bufoff;
+		int tocpy, didcpy, bufoff;
 		dmu_buf_t *db = dbp[i];

-		ASSERT(size > 0);
+		bufoff = offset - db->db_offset;
+		ASSERT3S(bufoff, >=, 0);

-		bufoff = uio->uio_loffset - db->db_offset;
 		tocpy = (int)MIN(db->db_size - bufoff, size);
+		if (tocpy == 0)
+			break;
+
+		err = dmu_req_copy(db->db_data + bufoff, tocpy, &didcpy, req);
+
+		if (didcpy < tocpy)
+			err = EIO;

-		err = uiomove((char *)db->db_data + bufoff, tocpy,
-		    UIO_READ, uio);
 		if (err)
 			break;

 		size -= tocpy;
+		offset += didcpy;
+		err = 0;
 	}
 	dmu_buf_rele_array(dbp, numbufs, FTAG);

@ -698,30 +753,31 @@ dmu_read_uio(objset_t *os, uint64_t object, uio_t *uio, uint64_t size)
 }

 int
-dmu_write_uio(objset_t *os, uint64_t object, uio_t *uio, uint64_t size,
-    dmu_tx_t *tx)
+dmu_write_req(objset_t *os, uint64_t object, struct request *req, dmu_tx_t *tx)
 {
+	uint64_t size = blk_rq_bytes(req);
+	uint64_t offset = blk_rq_pos(req) << 9;
 	dmu_buf_t **dbp;
-	int numbufs, i;
-	int err = 0;
+	int numbufs, i, err;

 	if (size == 0)
 		return (0);

-	err = dmu_buf_hold_array(os, object, uio->uio_loffset, size,
-	    FALSE, FTAG, &numbufs, &dbp);
+	err = dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG,
+				 &numbufs, &dbp);
 	if (err)
 		return (err);

 	for (i = 0; i < numbufs; i++) {
-		int tocpy;
-		int bufoff;
+		int tocpy, didcpy, bufoff;
 		dmu_buf_t *db = dbp[i];

-		ASSERT(size > 0);
+		bufoff = offset - db->db_offset;
+		ASSERT3S(bufoff, >=, 0);

-		bufoff = uio->uio_loffset - db->db_offset;
 		tocpy = (int)MIN(db->db_size - bufoff, size);
+		if (tocpy == 0)
+			break;

 		ASSERT(i == 0 || i == numbufs-1 || tocpy == db->db_size);

@ -730,27 +786,27 @@ dmu_write_uio(objset_t *os, uint64_t object, uio_t *uio, uint64_t size,
 		else
 			dmu_buf_will_dirty(db, tx);

-		/*
-		 * XXX uiomove could block forever (eg. nfs-backed
-		 * pages).  There needs to be a uiolockdown() function
-		 * to lock the pages in memory, so that uiomove won't
-		 * block.
-		 */
-		err = uiomove((char *)db->db_data + bufoff, tocpy,
-		    UIO_WRITE, uio);
+		err = dmu_req_copy(db->db_data + bufoff, tocpy, &didcpy, req);

 		if (tocpy == db->db_size)
 			dmu_buf_fill_done(db, tx);

+		if (didcpy < tocpy)
+			err = EIO;
+
 		if (err)
 			break;

 		size -= tocpy;
+		offset += didcpy;
+		err = 0;
 	}
 	dmu_buf_rele_array(dbp, numbufs, FTAG);
 	return (err);
 }
+#endif

+#ifdef HAVE_ZPL
 int
 dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
    page_t *pp, dmu_tx_t *tx)
--- a/module/zfs/dsl_dataset.c
+++ b/module/zfs/dsl_dataset.c
@ -246,7 +246,13 @@ dsl_dataset_evict(dmu_buf_t *db, void *dsv)

 	ASSERT(ds->ds_owner == NULL || DSL_DATASET_IS_DESTROYED(ds));

-	dprintf_ds(ds, "evicting %s\n", "");
+	/*
+	 * XXX: Commented out because dsl_dataset_name() is called
+	 * which references ds->ds_dir which it seems may be NULL.
+	 * This is easily trigged with 'zfs destroy <pool>/<ds>.
+	 *                   *
+	 * dprintf_ds(ds, "evicting %s\n", "");
+	 */

 	unique_remove(ds->ds_fsid_guid);

--- a/module/zfs/include/sys/blkdev.h
+++ b/module/zfs/include/sys/blkdev.h
@ -0,0 +1,164 @@
+#ifndef	_SYS_BLKDEV_H
+#define	_SYS_BLKDEV_H
+
+#ifdef _KERNEL
+
+#include <linux/blkdev.h>
+#include <linux/elevator.h>
+#include "zfs_config.h"
+
+#ifndef HAVE_BLK_FETCH_REQUEST
+static inline struct request *
+blk_fetch_request(struct request_queue *q)
+{
+	struct request *req;
+
+	req = elv_next_request(q);
+	if (req)
+		blkdev_dequeue_request(req);
+
+	return req;
+}
+#endif /* HAVE_BLK_FETCH_REQUEST */
+
+#ifndef HAVE_BLK_REQUEUE_REQUEST
+static inline void
+blk_requeue_request(request_queue_t *q, struct request *req)
+{
+	elv_requeue_request(q, req);
+}
+#endif /* HAVE_BLK_REQUEUE_REQUEST */
+
+#ifndef HAVE_BLK_END_REQUEST
+static inline bool
+blk_end_request(struct request *req, int error, unsigned int nr_bytes)
+{
+	struct request_queue *q = req->q;
+	LIST_HEAD(list);
+
+	/*
+	 * Request has already been dequeued but 2.6.18 version of
+	 * end_request() unconditionally dequeues the request so we
+	 * add it to a local list to prevent hitting the BUG_ON.
+	 */
+	list_add(&req->queuelist, &list);
+
+	/*
+	 * The old API required the driver to end each segment and not
+	 * the entire request.  In our case we always need to end the
+	 * entire request partial requests are not supported.
+	 */
+	req->hard_cur_sectors = nr_bytes >> 9;
+
+
+	spin_lock_irq(q->queue_lock);
+	end_request(req, ((error == 0) ? 1 : error));
+	spin_unlock_irq(q->queue_lock);
+
+	return 0;
+}
+#else
+# ifdef HAVE_BLK_END_REQUEST_GPL_ONLY
+/*
+ * Define required to avoid conflicting 2.6.29 non-static prototype for a
+ * GPL-only version of the helper.  As of 2.6.31 the helper is available
+ * to non-GPL modules and is not explicitly exported GPL-only.
+ */
+# define blk_end_request ___blk_end_request
+static inline bool
+___blk_end_request(struct request *req, int error, unsigned int nr_bytes)
+{
+	struct request_queue *q = req->q;
+
+	/*
+	 * The old API required the driver to end each segment and not
+	 * the entire request.  In our case we always need to end the
+	 * entire request partial requests are not supported.
+	 */
+	req->hard_cur_sectors = nr_bytes >> 9;
+
+	spin_lock_irq(q->queue_lock);
+	end_request(req, ((error == 0) ? 1 : error));
+	spin_unlock_irq(q->queue_lock);
+
+	return 0;
+}
+# endif /* HAVE_BLK_END_REQUEST_GPL_ONLY */
+#endif /* HAVE_BLK_END_REQUEST */
+
+#ifndef HAVE_BLK_RQ_POS
+static inline sector_t
+blk_rq_pos(struct request *req)
+{
+	return req->sector;
+}
+#endif /* HAVE_BLK_RQ_POS */
+
+#ifndef HAVE_BLK_RQ_SECTORS
+static inline unsigned int
+blk_rq_sectors(struct request *req)
+{
+	return req->nr_sectors;
+}
+#endif /* HAVE_BLK_RQ_SECTORS */
+
+#if !defined(HAVE_BLK_RQ_BYTES) || defined(HAVE_BLK_RQ_BYTES_GPL_ONLY)
+/*
+ * Define required to avoid conflicting 2.6.29 non-static prototype for a
+ * GPL-only version of the helper.  As of 2.6.31 the helper is available
+ * to non-GPL modules in the form of a static inline in the header.
+ */
+#define blk_rq_bytes __blk_rq_bytes
+static inline unsigned int
+__blk_rq_bytes(struct request *req)
+{
+	return blk_rq_sectors(req) << 9;
+}
+#endif /* !HAVE_BLK_RQ_BYTES || HAVE_BLK_RQ_BYTES_GPL_ONLY */
+
+#ifndef HAVE_GET_DISK_RO
+static inline int
+get_disk_ro(struct gendisk *disk)
+{
+	int policy = 0;
+
+	if (disk->part[0])
+		policy = disk->part[0]->policy;
+
+	return policy;
+}
+#endif /* HAVE_GET_DISK_RO */
+
+#ifndef HAVE_RQ_IS_SYNC
+static inline bool
+rq_is_sync(struct request *req)
+{
+	return (req->flags & REQ_RW_SYNC);
+}
+#endif /* HAVE_RQ_IS_SYNC */
+
+#ifndef HAVE_RQ_FOR_EACH_SEGMENT
+struct req_iterator {
+	int i;
+	struct bio *bio;
+};
+
+# define for_each_bio(_bio)              \
+	for (; _bio; _bio = _bio->bi_next)
+
+# define __rq_for_each_bio(_bio, rq)    \
+	if ((rq->bio))                  \
+		for (_bio = (rq)->bio; _bio; _bio = _bio->bi_next)
+
+# define rq_for_each_segment(bvl, _rq, _iter)                   \
+	__rq_for_each_bio(_iter.bio, _rq)                       \
+		bio_for_each_segment(bvl, _iter.bio, _iter.i)
+#endif /* HAVE_RQ_FOR_EACH_SEGMENT */
+
+#ifndef DISK_NAME_LEN
+#define DISK_NAME_LEN	32
+#endif /* DISK_NAME_LEN */
+
+#endif /* KERNEL */
+
+#endif	/* _SYS_BLKDEV_H */
--- a/module/zfs/include/sys/dmu.h
+++ b/module/zfs/include/sys/dmu.h
@ -38,12 +38,14 @@
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/cred.h>
+#ifdef _KERNEL
+#include <sys/blkdev.h>
+#endif

 #ifdef	__cplusplus
 extern "C" {
 #endif

-struct uio;
 struct page;
 struct vnode;
 struct spa;
@ -486,11 +488,14 @@ void dmu_write(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
 	const void *buf, dmu_tx_t *tx);
 void dmu_prealloc(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
 	dmu_tx_t *tx);
-int dmu_read_uio(objset_t *os, uint64_t object, struct uio *uio, uint64_t size);
-int dmu_write_uio(objset_t *os, uint64_t object, struct uio *uio, uint64_t size,
-    dmu_tx_t *tx);
+#ifdef _KERNEL
+int dmu_read_req(objset_t *os, uint64_t object, struct request *req);
+int dmu_write_req(objset_t *os, uint64_t object, struct request *req, dmu_tx_t *tx);
+#endif
+#ifdef HAVE_ZPL
 int dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset,
    uint64_t size, struct page *pp, dmu_tx_t *tx);
+#endif
 struct arc_buf *dmu_request_arcbuf(dmu_buf_t *handle, int size);
 void dmu_return_arcbuf(struct arc_buf *buf);
 void dmu_assign_arcbuf(dmu_buf_t *handle, uint64_t offset, struct arc_buf *buf,
--- a/module/zfs/include/sys/spa.h
+++ b/module/zfs/include/sys/spa.h
@ -456,7 +456,6 @@ extern uint64_t spa_get_dspace(spa_t *spa);
 extern uint64_t spa_get_asize(spa_t *spa, uint64_t lsize);
 extern uint64_t spa_version(spa_t *spa);
 extern int spa_max_replication(spa_t *spa);
-extern int spa_busy(void);
 extern uint8_t spa_get_failmode(spa_t *spa);
 extern boolean_t spa_suspended(spa_t *spa);

--- a/module/zfs/include/sys/zfs_fuid.h
+++ b/module/zfs/include/sys/zfs_fuid.h
@ -98,6 +98,7 @@ typedef struct zfs_fuid_info {
 } zfs_fuid_info_t;

 #ifdef _KERNEL
+#ifdef HAVE_ZPL
 struct znode;
 extern uid_t zfs_fuid_map_id(zfsvfs_t *, uint64_t, cred_t *, zfs_fuid_type_t);
 extern void zfs_fuid_destroy(zfsvfs_t *);
@ -115,6 +116,7 @@ extern int zfs_fuid_find_by_domain(zfsvfs_t *, const char *domain,
    char **retdomain, boolean_t addok);
 extern const char *zfs_fuid_find_by_idx(zfsvfs_t *zfsvfs, uint32_t idx);
 extern void zfs_fuid_txhold(zfsvfs_t *zfsvfs, dmu_tx_t *tx);
+#endif /* HAVE_ZPL */
 #endif

 char *zfs_fuid_idx_domain(avl_tree_t *, uint32_t);
--- a/module/zfs/include/sys/zfs_ioctl.h
+++ b/module/zfs/include/sys/zfs_ioctl.h
@ -191,7 +191,6 @@ extern int zfs_secpolicy_snapshot_perms(const char *name, cred_t *cr);
 extern int zfs_secpolicy_rename_perms(const char *from,
    const char *to, cred_t *cr);
 extern int zfs_secpolicy_destroy_perms(const char *name, cred_t *cr);
-extern int zfs_busy(void);
 extern int zfs_unmount_snap(char *, void *);

 #endif	/* _KERNEL */
--- a/module/zfs/include/sys/zfs_znode.h
+++ b/module/zfs/include/sys/zfs_znode.h
@ -342,8 +342,10 @@ extern void zfs_xvattr_set(znode_t *zp, xvattr_t *xvap);
 extern void zfs_upgrade(zfsvfs_t *zfsvfs, dmu_tx_t *tx);
 extern int zfs_create_share_dir(zfsvfs_t *zfsvfs, dmu_tx_t *tx);

+#if defined(HAVE_UIO_RW)
 extern caddr_t zfs_map_page(page_t *, enum seg_rw);
 extern void zfs_unmap_page(page_t *, caddr_t);
+#endif /* HAVE_UIO_RW */

 extern zil_get_data_t zfs_get_data;
 extern zil_replay_func_t *zfs_replay_vector[TX_MAX_TYPE];
--- a/module/zfs/include/sys/zvol.h
+++ b/module/zfs/include/sys/zvol.h
@ -20,51 +20,33 @@
 */

 /*
- * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
+ * Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
 * Use is subject to license terms.
 */

 #ifndef	_SYS_ZVOL_H
 #define	_SYS_ZVOL_H

-
-
 #include <sys/zfs_context.h>

-#ifdef	__cplusplus
-extern "C" {
-#endif
-
 #define	ZVOL_OBJ		1ULL
 #define	ZVOL_ZAP_OBJ		2ULL

 #ifdef _KERNEL
+
+#include <sys/blkdev.h>
+
 extern int zvol_check_volsize(uint64_t volsize, uint64_t blocksize);
 extern int zvol_check_volblocksize(uint64_t volblocksize);
 extern int zvol_get_stats(objset_t *os, nvlist_t *nv);
 extern void zvol_create_cb(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx);
-extern int zvol_create_minor(const char *, major_t);
+extern int zvol_create_minor(const char *);
 extern int zvol_remove_minor(const char *);
-extern int zvol_set_volsize(const char *, major_t, uint64_t);
+extern int zvol_set_volsize(const char *, uint64_t);
 extern int zvol_set_volblocksize(const char *, uint64_t);

-extern int zvol_open(dev_t *devp, int flag, int otyp, cred_t *cr);
-extern int zvol_dump(dev_t dev, caddr_t addr, daddr_t offset, int nblocks);
-extern int zvol_close(dev_t dev, int flag, int otyp, cred_t *cr);
-extern int zvol_strategy(buf_t *bp);
-extern int zvol_read(dev_t dev, uio_t *uiop, cred_t *cr);
-extern int zvol_write(dev_t dev, uio_t *uiop, cred_t *cr);
-extern int zvol_aread(dev_t dev, struct aio_req *aio, cred_t *cr);
-extern int zvol_awrite(dev_t dev, struct aio_req *aio, cred_t *cr);
-extern int zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr,
-    int *rvalp);
-extern int zvol_busy(void);
-extern void zvol_init(void);
+extern int zvol_init(void);
 extern void zvol_fini(void);
-#endif
-
-#ifdef	__cplusplus
-}
-#endif
+#endif /* KERNEL */

 #endif	/* _SYS_ZVOL_H */
--- a/module/zfs/rrwlock.c
+++ b/module/zfs/rrwlock.c
@ -23,6 +23,8 @@
 * Use is subject to license terms.
 */

+#ifdef HAVE_ZPL
+
 #include <sys/refcount.h>
 #include <sys/rrwlock.h>

@ -262,3 +264,4 @@ rrw_held(rrwlock_t *rrl, krw_t rw)

 	return (held);
 }
+#endif /* HAVE_ZPL */
--- a/module/zfs/spa_misc.c
+++ b/module/zfs/spa_misc.c
@ -1343,12 +1343,6 @@ spa_name_compare(const void *a1, const void *a2)
 	return (0);
 }

-int
-spa_busy(void)
-{
-	return (spa_active_count);
-}
-
 void
 spa_boot_init(void)
 {
--- a/module/zfs/zfs_acl.c
+++ b/module/zfs/zfs_acl.c
@ -23,6 +23,8 @@
 * Use is subject to license terms.
 */

+#ifdef HAVE_ZPL
+
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/time.h>
@ -2848,3 +2850,5 @@ zfs_zaccess_rename(znode_t *sdzp, znode_t *szp, znode_t *tdzp,

 	return (error);
 }
+
+#endif /* HAVE_ZPL */
--- a/module/zfs/zfs_ctldir.c
+++ b/module/zfs/zfs_ctldir.c
@ -64,6 +64,8 @@
 * so that it cannot be freed until all snapshots have been unmounted.
 */

+#ifdef HAVE_ZPL
+
 #include <fs/fs_subr.h>
 #include <sys/zfs_ctldir.h>
 #include <sys/zfs_ioctl.h>
@ -1333,3 +1335,4 @@ zfsctl_umount_snapshots(vfs_t *vfsp, int fflags, cred_t *cr)

 	return (error);
 }
+#endif /* HAVE_ZPL */
--- a/module/zfs/zfs_dir.c
+++ b/module/zfs/zfs_dir.c
@ -23,6 +23,8 @@
 * Use is subject to license terms.
 */

+#ifdef HAVE_ZPL
+
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/time.h>
@ -962,3 +964,4 @@ zfs_sticky_remove_access(znode_t *zdp, znode_t *zp, cred_t *cr)
 	else
 		return (secpolicy_vnode_remove(cr));
 }
+#endif /* HAVE_ZPL */
--- a/module/zfs/zfs_fuid.c
+++ b/module/zfs/zfs_fuid.c
@ -194,6 +194,7 @@ zfs_fuid_idx_domain(avl_tree_t *idx_tree, uint32_t idx)
 }

 #ifdef _KERNEL
+#ifdef HAVE_ZPL
 /*
 * Load the fuid table(s) into memory.
 */
@ -743,4 +744,5 @@ zfs_fuid_txhold(zfsvfs_t *zfsvfs, dmu_tx_t *tx)
 		    FUID_SIZE_ESTIMATE(zfsvfs));
 	}
 }
+#endif /* HAVE_ZPL */
 #endif
--- a/module/zfs/zfs_ioctl.c
+++ b/module/zfs/zfs_ioctl.c
@ -64,18 +64,16 @@
 #include <sharefs/share.h>
 #include <sys/dmu_objset.h>

+#include <linux/miscdevice.h>
+
 #include "zfs_namecheck.h"
 #include "zfs_prop.h"
 #include "zfs_deleg.h"
-
-extern struct modlfs zfs_modlfs;
+#include "zfs_config.h"

 extern void zfs_init(void);
 extern void zfs_fini(void);

-ldi_ident_t zfs_li = NULL;
-dev_info_t *zfs_dip;
-
 typedef int zfs_ioc_func_t(zfs_cmd_t *);
 typedef int zfs_secpolicy_func_t(zfs_cmd_t *, cred_t *);

@ -403,6 +401,7 @@ zfs_secpolicy_send(zfs_cmd_t *zc, cred_t *cr)
 	    ZFS_DELEG_PERM_SEND, cr));
 }

+#ifdef HAVE_ZPL
 static int
 zfs_secpolicy_deleg_share(zfs_cmd_t *zc, cred_t *cr)
 {
@ -426,10 +425,12 @@ zfs_secpolicy_deleg_share(zfs_cmd_t *zc, cred_t *cr)
 	return (dsl_deleg_access(zc->zc_name,
 	    ZFS_DELEG_PERM_SHARE, cr));
 }
+#endif /* HAVE_ZPL */

 int
 zfs_secpolicy_share(zfs_cmd_t *zc, cred_t *cr)
 {
+#ifdef HAVE_ZPL
 	if (!INGLOBALZONE(curproc))
 		return (EPERM);

@ -438,11 +439,15 @@ zfs_secpolicy_share(zfs_cmd_t *zc, cred_t *cr)
 	} else {
 		return (zfs_secpolicy_deleg_share(zc, cr));
 	}
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 int
 zfs_secpolicy_smb_acl(zfs_cmd_t *zc, cred_t *cr)
 {
+#ifdef HAVE_ZPL
 	if (!INGLOBALZONE(curproc))
 		return (EPERM);

@ -451,6 +456,9 @@ zfs_secpolicy_smb_acl(zfs_cmd_t *zc, cred_t *cr)
 	} else {
 		return (zfs_secpolicy_deleg_share(zc, cr));
 	}
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 static int
@ -645,6 +653,7 @@ zfs_secpolicy_create(zfs_cmd_t *zc, cred_t *cr)
 	return (error);
 }

+#ifdef HAVE_ZPL
 static int
 zfs_secpolicy_umount(zfs_cmd_t *zc, cred_t *cr)
 {
@ -656,6 +665,7 @@ zfs_secpolicy_umount(zfs_cmd_t *zc, cred_t *cr)
 	}
 	return (error);
 }
+#endif /* HAVE_ZPL */

 /*
 * Policy for pool operations - create/destroy pools, add vdevs, etc.  Requires
@ -836,6 +846,7 @@ put_nvlist(zfs_cmd_t *zc, nvlist_t *nvl)
 	return (error);
 }

+#ifdef HAVE_ZPL
 static int
 getzfsvfs(const char *dsname, zfsvfs_t **zvp)
 {
@ -898,6 +909,7 @@ zfsvfs_rele(zfsvfs_t *zfsvfs, void *tag)
 		zfsvfs_free(zfsvfs);
 	}
 }
+#endif /* HAVE_ZPL */

 static int
 zfs_ioc_pool_create(zfs_cmd_t *zc)
@ -1713,6 +1725,7 @@ zfs_set_prop_nvlist(const char *name, nvlist_t *nvl)

 		if (prop == ZPROP_INVAL) {
 			if (zfs_prop_userquota(propname)) {
+#ifdef HAVE_ZPL
 				uint64_t *valary;
 				unsigned int vallen;
 				const char *domain;
@ -1741,6 +1754,10 @@ zfs_set_prop_nvlist(const char *name, nvlist_t *nvl)
 					continue;
 				else
 					goto out;
+#else
+				error = ENOTSUP;
+				goto out;
+#endif
 			} else if (zfs_prop_user(propname)) {
 				VERIFY(nvpair_value_string(elem, &strval) == 0);
 				error = dsl_prop_set(name, propname, 1,
@ -1781,8 +1798,7 @@ zfs_set_prop_nvlist(const char *name, nvlist_t *nvl)

 		case ZFS_PROP_VOLSIZE:
 			if ((error = nvpair_value_uint64(elem, &intval)) != 0 ||
-			    (error = zvol_set_volsize(name,
-			    ddi_driver_major(zfs_dip), intval)) != 0)
+			    (error = zvol_set_volsize(name, intval)) != 0)
 				goto out;
 			break;

@ -1794,6 +1810,7 @@ zfs_set_prop_nvlist(const char *name, nvlist_t *nvl)

 		case ZFS_PROP_VERSION:
 		{
+#ifdef HAVE_ZPL
 			zfsvfs_t *zfsvfs;

 			if ((error = nvpair_value_uint64(elem, &intval)) != 0)
@ -1812,6 +1829,10 @@ zfs_set_prop_nvlist(const char *name, nvlist_t *nvl)
 			if (error)
 				goto out;
 			break;
+#else
+			error = ENOTSUP;
+			goto out;
+#endif /* HAVE_ZPL */
 		}

 		default:
@ -2023,6 +2044,7 @@ zfs_ioc_pool_get_props(zfs_cmd_t *zc)
 static int
 zfs_ioc_iscsi_perm_check(zfs_cmd_t *zc)
 {
+#ifdef HAVE_ZPL
 	nvlist_t *nvp;
 	int error;
 	uint32_t uid;
@ -2065,6 +2087,9 @@ zfs_ioc_iscsi_perm_check(zfs_cmd_t *zc)
 	    zfs_prop_to_name(ZFS_PROP_SHAREISCSI), usercred);
 	crfree(usercred);
 	return (error);
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 /*
@ -2147,7 +2172,7 @@ zfs_ioc_get_fsacl(zfs_cmd_t *zc)
 static int
 zfs_ioc_create_minor(zfs_cmd_t *zc)
 {
-	return (zvol_create_minor(zc->zc_name, ddi_driver_major(zfs_dip)));
+	return (zvol_create_minor(zc->zc_name));
 }

 /*
@ -2162,6 +2187,7 @@ zfs_ioc_remove_minor(zfs_cmd_t *zc)
 	return (zvol_remove_minor(zc->zc_name));
 }

+#ifdef HAVE_ZPL
 /*
 * Search the vfs list for a specified resource.  Returns a pointer to it
 * or NULL if no suitable entry is found. The caller of this routine
@ -2186,6 +2212,7 @@ zfs_get_vfs(const char *resource)
 	vfs_list_unlock();
 	return (vfs_found);
 }
+#endif /* HAVE_ZPL */

 /* ARGSUSED */
 static void
@ -2535,6 +2562,7 @@ out:
 int
 zfs_unmount_snap(char *name, void *arg)
 {
+#ifdef HAVE_ZPL
 	vfs_t *vfsp = NULL;

 	if (arg) {
@ -2566,6 +2594,7 @@ zfs_unmount_snap(char *name, void *arg)
 		if ((err = dounmount(vfsp, flag, kcred)) != 0)
 			return (err);
 	}
+#endif /* HAVE_ZPL */
 	return (0);
 }

@ -2621,6 +2650,7 @@ zfs_ioc_destroy(zfs_cmd_t *zc)
 static int
 zfs_ioc_rollback(zfs_cmd_t *zc)
 {
+#ifdef HAVE_ZPL
 	objset_t *os;
 	int error;
 	zfsvfs_t *zfsvfs = NULL;
@ -2654,6 +2684,9 @@ zfs_ioc_rollback(zfs_cmd_t *zc)
 	/* Note, the dmu_objset_rollback() releases the objset for us. */

 	return (error);
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 /*
@ -2727,7 +2760,9 @@ static int
 zfs_ioc_recv(zfs_cmd_t *zc)
 {
 	file_t *fp;
+#ifdef HAVE_ZPL
 	objset_t *os;
+#endif /* HAVE_ZPL */
 	dmu_recv_cookie_t drc;
 	boolean_t force = (boolean_t)zc->zc_guid;
 	int error, fd;
@ -2760,6 +2795,7 @@ zfs_ioc_recv(zfs_cmd_t *zc)
 		return (EBADF);
 	}

+#ifdef HAVE_ZPL
 	if (props && dmu_objset_open(tofs, DMU_OST_ANY,
 	    DS_MODE_USER | DS_MODE_READONLY, &os) == 0) {
 		/*
@ -2770,6 +2806,7 @@ zfs_ioc_recv(zfs_cmd_t *zc)

 		dmu_objset_close(os);
 	}
+#endif /* HAVE_ZPL */

 	if (zc->zc_string[0]) {
 		error = dmu_objset_open(zc->zc_string, DMU_OST_ANY,
@ -2801,6 +2838,7 @@ zfs_ioc_recv(zfs_cmd_t *zc)
 	error = dmu_recv_stream(&drc, fp->f_vnode, &off);

 	if (error == 0) {
+#ifdef HAVE_ZPL
 		zfsvfs_t *zfsvfs = NULL;

 		if (getzfsvfs(tofs, &zfsvfs) == 0) {
@ -2827,6 +2865,9 @@ zfs_ioc_recv(zfs_cmd_t *zc)
 		} else {
 			error = dmu_recv_end(&drc);
 		}
+#else
+		error = dmu_recv_end(&drc);
+#endif /* HAVE_ZPL */
 	}

 	zc->zc_cookie = off - fp->f_offset;
@ -3057,6 +3098,7 @@ zfs_ioc_promote(zfs_cmd_t *zc)
 static int
 zfs_ioc_userspace_one(zfs_cmd_t *zc)
 {
+#ifdef HAVE_ZPL
 	zfsvfs_t *zfsvfs;
 	int error;

@ -3072,6 +3114,9 @@ zfs_ioc_userspace_one(zfs_cmd_t *zc)
 	zfsvfs_rele(zfsvfs, FTAG);

 	return (error);
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 /*
@ -3088,6 +3133,7 @@ zfs_ioc_userspace_one(zfs_cmd_t *zc)
 static int
 zfs_ioc_userspace_many(zfs_cmd_t *zc)
 {
+#ifdef HAVE_ZPL
 	zfsvfs_t *zfsvfs;
 	int error;

@ -3110,6 +3156,9 @@ zfs_ioc_userspace_many(zfs_cmd_t *zc)
 	zfsvfs_rele(zfsvfs, FTAG);

 	return (error);
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 /*
@ -3122,6 +3171,7 @@ zfs_ioc_userspace_many(zfs_cmd_t *zc)
 static int
 zfs_ioc_userspace_upgrade(zfs_cmd_t *zc)
 {
+#ifdef HAVE_ZPL
 	objset_t *os;
 	int error;
 	zfsvfs_t *zfsvfs;
@ -3154,6 +3204,9 @@ zfs_ioc_userspace_upgrade(zfs_cmd_t *zc)
 	}

 	return (error);
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 /*
@ -3163,6 +3216,7 @@ zfs_ioc_userspace_upgrade(zfs_cmd_t *zc)
 * the first file system is shared.
 * Neither sharefs, nfs or smbsrv are unloadable modules.
 */
+#ifdef HAVE_ZPL
 int (*znfsexport_fs)(void *arg);
 int (*zshare_fs)(enum sharefs_sys_op, share_t *, uint32_t);
 int (*zsmbexport_fs)(void *arg, boolean_t add_share);
@ -3194,10 +3248,12 @@ zfs_init_sharefs()
 	}
 	return (0);
 }
+#endif /* HAVE_ZPL */

 static int
 zfs_ioc_share(zfs_cmd_t *zc)
 {
+#ifdef HAVE_ZPL
 	int error;
 	int opcode;

@ -3287,7 +3343,9 @@ zfs_ioc_share(zfs_cmd_t *zc)
 	    zc->zc_share.z_sharemax);

 	return (error);
-
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 ace_t full_access[] = {
@ -3297,6 +3355,7 @@ ace_t full_access[] = {
 /*
 * Remove all ACL files in shares dir
 */
+#ifdef HAVE_ZPL
 static int
 zfs_smb_acl_purge(znode_t *dzp)
 {
@ -3315,10 +3374,12 @@ zfs_smb_acl_purge(znode_t *dzp)
 	zap_cursor_fini(&zc);
 	return (error);
 }
+#endif /* HAVE ZPL */

 static int
 zfs_ioc_smb_acl(zfs_cmd_t *zc)
 {
+#ifdef HAVE_ZPL
 	vnode_t *vp;
 	znode_t *dzp;
 	vnode_t *resourcevp = NULL;
@ -3440,6 +3501,9 @@ zfs_ioc_smb_acl(zfs_cmd_t *zc)
 	ZFS_EXIT(zfsvfs);

 	return (error);
+#else
+	return (ENOTSUP);
+#endif /* HAVE_ZPL */
 }

 /*
@ -3632,28 +3696,23 @@ pool_status_check(const char *name, zfs_ioc_namecheck_t type)
 	return (error);
 }

-static int
-zfsdev_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp)
+static long
+zfs_ioctl(struct file *filp, unsigned cmd, unsigned long arg)
 {
 	zfs_cmd_t *zc;
 	uint_t vec;
-	int error, rc;
-
-	if (getminor(dev) != 0)
-		return (zvol_ioctl(dev, cmd, arg, flag, cr, rvalp));
+	int error, rc, flag = 0;

 	vec = cmd - ZFS_IOC;
-	ASSERT3U(getmajor(dev), ==, ddi_driver_major(zfs_dip));
-
 	if (vec >= sizeof (zfs_ioc_vec) / sizeof (zfs_ioc_vec[0]))
-		return (EINVAL);
+		return (-EINVAL);

 	zc = kmem_zalloc(sizeof (zfs_cmd_t), KM_SLEEP);

 	error = ddi_copyin((void *)arg, zc, sizeof (zfs_cmd_t), flag);

 	if (error == 0)
-		error = zfs_ioc_vec[vec].zvec_secpolicy(zc, cr);
+		error = zfs_ioc_vec[vec].zvec_secpolicy(zc, NULL);

 	/*
 	 * Ensure that all pool/dataset names are valid before we pass down to
@ -3695,121 +3754,59 @@ zfsdev_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp)
 	}

 	kmem_free(zc, sizeof (zfs_cmd_t));
+	return (-error);
+}
+
+#ifdef CONFIG_COMPAT
+static long
+zfs_compat_ioctl(struct file *filp, unsigned cmd, unsigned long arg)
+{
+        return zfs_ioctl(filp, cmd, arg);
+}
+#else
+#define zfs_compat_ioctl   NULL
+#endif
+
+static const struct file_operations zfs_fops = {
+	.unlocked_ioctl  = zfs_ioctl,
+	.compat_ioctl    = zfs_compat_ioctl,
+	.owner           = THIS_MODULE,
+};
+
+static struct miscdevice zfs_misc = {
+	.minor          = MISC_DYNAMIC_MINOR,
+	.name           = ZFS_DRIVER,
+	.fops           = &zfs_fops,
+};
+
+static int
+zfs_attach(void)
+{
+	int error;
+
+	error = misc_register(&zfs_misc);
+        if (error) {
+		printk(KERN_INFO "ZFS: misc_register() failed %d\n", error);
 		return (error);
-}
-
-static int
-zfs_attach(dev_info_t *dip, ddi_attach_cmd_t cmd)
-{
-	if (cmd != DDI_ATTACH)
-		return (DDI_FAILURE);
-
-	if (ddi_create_minor_node(dip, "zfs", S_IFCHR, 0,
-	    DDI_PSEUDO, 0) == DDI_FAILURE)
-		return (DDI_FAILURE);
-
-	zfs_dip = dip;
-
-	ddi_report_dev(dip);
-
-	return (DDI_SUCCESS);
-}
-
-static int
-zfs_detach(dev_info_t *dip, ddi_detach_cmd_t cmd)
-{
-	if (spa_busy() || zfs_busy() || zvol_busy())
-		return (DDI_FAILURE);
-
-	if (cmd != DDI_DETACH)
-		return (DDI_FAILURE);
-
-	zfs_dip = NULL;
-
-	ddi_prop_remove_all(dip);
-	ddi_remove_minor_node(dip, NULL);
-
-	return (DDI_SUCCESS);
-}
-
-/*ARGSUSED*/
-static int
-zfs_info(dev_info_t *dip, ddi_info_cmd_t infocmd, void *arg, void **result)
-{
-	switch (infocmd) {
-	case DDI_INFO_DEVT2DEVINFO:
-		*result = zfs_dip;
-		return (DDI_SUCCESS);
-
-	case DDI_INFO_DEVT2INSTANCE:
-		*result = (void *)0;
-		return (DDI_SUCCESS);
 	}

-	return (DDI_FAILURE);
+	return (0);
 }

-/*
- * OK, so this is a little weird.
- *
- * /dev/zfs is the control node, i.e. minor 0.
- * /dev/zvol/[r]dsk/pool/dataset are the zvols, minor > 0.
- *
- * /dev/zfs has basically nothing to do except serve up ioctls,
- * so most of the standard driver entry points are in zvol.c.
- */
-static struct cb_ops zfs_cb_ops = {
-	zvol_open,	/* open */
-	zvol_close,	/* close */
-	zvol_strategy,	/* strategy */
-	nodev,		/* print */
-	zvol_dump,	/* dump */
-	zvol_read,	/* read */
-	zvol_write,	/* write */
-	zfsdev_ioctl,	/* ioctl */
-	nodev,		/* devmap */
-	nodev,		/* mmap */
-	nodev,		/* segmap */
-	nochpoll,	/* poll */
-	ddi_prop_op,	/* prop_op */
-	NULL,		/* streamtab */
-	D_NEW | D_MP | D_64BIT,		/* Driver compatibility flag */
-	CB_REV,		/* version */
-	nodev,		/* async read */
-	nodev,		/* async write */
-};
-
-static struct dev_ops zfs_dev_ops = {
-	DEVO_REV,	/* version */
-	0,		/* refcnt */
-	zfs_info,	/* info */
-	nulldev,	/* identify */
-	nulldev,	/* probe */
-	zfs_attach,	/* attach */
-	zfs_detach,	/* detach */
-	nodev,		/* reset */
-	&zfs_cb_ops,	/* driver operations */
-	NULL,		/* no bus operations */
-	NULL,		/* power */
-	ddi_quiesce_not_needed,	/* quiesce */
-};
-
-static struct modldrv zfs_modldrv = {
-	&mod_driverops,
-	"ZFS storage pool",
-	&zfs_dev_ops
-};
-
-static struct modlinkage modlinkage = {
-	MODREV_1,
-	(void *)&zfs_modlfs,
-	(void *)&zfs_modldrv,
-	NULL
-};
+static void
+zfs_detach(void)
+{
+	int error;

+	error = misc_deregister(&zfs_misc);
+	if (error)
+		printk(KERN_INFO "ZFS: misc_deregister() failed %d\n", error);
+}

+#ifdef HAVE_ZPL
 uint_t zfs_fsyncer_key;
 extern uint_t rrw_tsd_key;
+#endif

 int
 _init(void)
@ -3818,21 +3815,28 @@ _init(void)

 	spa_init(FREAD | FWRITE);
 	zfs_init();
-	zvol_init();

-	if ((error = mod_install(&modlinkage)) != 0) {
-		zvol_fini();
+	if ((error = zvol_init()) != 0) {
 		zfs_fini();
 		spa_fini();
 		return (error);
 	}

+	if ((error = zfs_attach()) != 0) {
+		(void)zvol_fini();
+		zfs_fini();
+		spa_fini();
+		return (error);
+	}
+
+#ifdef HAVE_ZPL
 	tsd_create(&zfs_fsyncer_key, NULL);
 	tsd_create(&rrw_tsd_key, NULL);

-	error = ldi_ident_from_mod(&modlinkage, &zfs_li);
-	ASSERT(error == 0);
 	mutex_init(&zfs_share_lock, NULL, MUTEX_DEFAULT, NULL);
+#endif /* HAVE_ZPL */
+
+	printk(KERN_INFO "ZFS: Loaded ZFS Filesystem v%s\n", ZFS_META_VERSION);

 	return (0);
 }
@ -3840,17 +3844,11 @@ _init(void)
 int
 _fini(void)
 {
-	int error;
-
-	if (spa_busy() || zfs_busy() || zvol_busy() || zio_injection_enabled)
-		return (EBUSY);
-
-	if ((error = mod_remove(&modlinkage)) != 0)
-		return (error);
-
+	zfs_detach();
 	zvol_fini();
 	zfs_fini();
 	spa_fini();
+#ifdef HAVE_ZPL
 	if (zfs_nfsshare_inited)
 		(void) ddi_modclose(nfs_mod);
 	if (zfs_smbshare_inited)
@ -3858,16 +3856,18 @@ _fini(void)
 	if (zfs_nfsshare_inited || zfs_smbshare_inited)
 		(void) ddi_modclose(sharefs_mod);

-	tsd_destroy(&zfs_fsyncer_key);
-	ldi_ident_release(zfs_li);
-	zfs_li = NULL;
 	mutex_destroy(&zfs_share_lock);
+	tsd_destroy(&zfs_fsyncer_key);
+#endif /* HAVE_ZPL */

-	return (error);
+	return (0);
 }

-int
-_info(struct modinfo *modinfop)
-{
-	return (mod_info(&modlinkage, modinfop));
-}
+#ifdef HAVE_SPL
+spl_module_init(_init);
+spl_module_exit(_fini);
+
+MODULE_AUTHOR("Sun Microsystems, Inc");
+MODULE_DESCRIPTION("ZFS");
+MODULE_LICENSE("CDDL");
+#endif /* HAVE_SPL */
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@ -23,6 +23,8 @@
 * Use is subject to license terms.
 */

+#ifdef HAVE_ZPL
+
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/systm.h>
@ -704,3 +706,5 @@ zfs_log_acl(zilog_t *zilog, dmu_tx_t *tx, znode_t *zp,
 	seq = zil_itx_assign(zilog, itx, tx);
 	zp->z_last_itx = seq;
 }
+
+#endif /* HAVE_ZPL */
--- a/module/zfs/zfs_replay.c
+++ b/module/zfs/zfs_replay.c
@ -23,7 +23,7 @@
 * Use is subject to license terms.
 */

-
+#ifdef HAVE_ZPL

 #include <sys/types.h>
 #include <sys/param.h>
@ -876,3 +876,4 @@ zil_replay_func_t *zfs_replay_vector[TX_MAX_TYPE] = {
 	zfs_replay_create,	/* TX_MKDIR_ATTR */
 	zfs_replay_create_acl,	/* TX_MKDIR_ACL_ATTR */
 };
+#endif /* HAVE_ZPL */
--- a/module/zfs/zfs_vfsops.c
+++ b/module/zfs/zfs_vfsops.c
@ -61,6 +61,7 @@
 #include <sys/dmu_objset.h>
 #include <sys/spa_boot.h>

+#ifdef HAVE_ZPL
 int zfsfstype;
 vfsops_t *zfs_vfsops = NULL;
 static major_t zfs_major;
@ -1957,10 +1958,12 @@ zfs_vfsinit(int fstype, char *name)

 	return (0);
 }
+#endif /* HAVE_ZPL */

 void
 zfs_init(void)
 {
+#ifdef HAVE_ZPL
 	/*
 	 * Initialize .zfs directory structures
 	 */
@ -1972,21 +1975,19 @@ zfs_init(void)
 	zfs_znode_init();

 	dmu_objset_register_type(DMU_OST_ZFS, zfs_space_delta_cb);
+#endif /* HAVE_ZPL */
 }

 void
 zfs_fini(void)
 {
+#ifdef HAVE_ZPL
 	zfsctl_fini();
 	zfs_znode_fini();
+#endif /* HAVE_ZPL */
 }

-int
-zfs_busy(void)
-{
-	return (zfs_active_fs_count != 0);
-}
-
+#ifdef HAVE_ZPL
 int
 zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers)
 {
@ -2029,6 +2030,7 @@ zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers)

 	return (0);
 }
+#endif /* HAVE_ZPL */

 /*
 * Read a property stored within the master node.
@ -2072,6 +2074,7 @@ zfs_get_zplprop(objset_t *os, zfs_prop_t prop, uint64_t *value)
 	return (error);
 }

+#ifdef HAVE_ZPL
 static vfsdef_t vfw = {
 	VFSDEF_VERSION,
 	MNTTYPE_ZFS,
@ -2084,3 +2087,4 @@ static vfsdef_t vfw = {
 struct modlfs zfs_modlfs = {
 	&mod_fsops, "ZFS filesystem version " SPA_VERSION_STRING, &vfw
 };
+#endif /* HAVE_ZPL */
--- a/module/zfs/zfs_vnops.c
+++ b/module/zfs/zfs_vnops.c
@ -25,6 +25,8 @@

 /* Portions Copyright 2007 Jeremy Teo */

+#ifdef HAVE_ZPL
+
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/time.h>
@ -318,6 +320,7 @@ zfs_ioctl(vnode_t *vp, int com, intptr_t data, int flag, cred_t *cred,
 	return (ENOTTY);
 }

+#if defined(_KERNEL) && defined(HAVE_UIO_RW)
 /*
 * Utility functions to map and unmap a single physical page.  These
 * are used to manage the mappable copies of ZFS file data, and therefore
@ -342,6 +345,7 @@ zfs_unmap_page(page_t *pp, caddr_t addr)
 		ppmapout(addr);
 	}
 }
+#endif /* _KERNEL && HAVE_UIO_RW */

 /*
 * When a file is memory mapped, we must keep the IO data synchronized
@ -4695,3 +4699,4 @@ const fs_operation_def_t zfs_evnodeops_template[] = {
 	VOPNAME_PATHCONF,	{ .vop_pathconf = zfs_pathconf },
 	NULL,			NULL
 };
+#endif /* HAVE_ZPL */
--- a/module/zfs/zfs_znode.c
+++ b/module/zfs/zfs_znode.c
@ -87,6 +87,7 @@
 * (such as VFS logic) that will not compile easily in userland.
 */
 #ifdef _KERNEL
+#ifdef HAVE_ZPL
 /*
 * Needed to close a small window in zfs_znode_move() that allows the zfsvfs to
 * be freed before it can be safely accessed.
@ -1473,21 +1474,28 @@ log:
 	dmu_tx_commit(tx);
 	return (0);
 }
+#endif /* HAVE_ZPL */

 void
 zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *zplprops, dmu_tx_t *tx)
 {
-	zfsvfs_t	zfsvfs;
 	uint64_t	moid, obj, version;
 	uint64_t	sense = ZFS_CASE_SENSITIVE;
 	uint64_t	norm = 0;
 	nvpair_t	*elem;
 	int		error;
+#ifdef HAVE_ZPL
+	zfsvfs_t	zfsvfs;
 	znode_t		*rootzp = NULL;
 	vnode_t		*vp;
 	vattr_t		vattr;
 	znode_t		*zp;
 	zfs_acl_ids_t	acl_ids;
+#else
+	timestruc_t	now;
+	dmu_buf_t	*db;
+	znode_phys_t	*pzp;
+#endif /* HAVE_ZPL */

 	/*
 	 * First attempt to create master node.
@ -1542,6 +1550,7 @@ zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *zplprops, dmu_tx_t *tx)
 	error = zap_add(os, moid, ZFS_UNLINKED_SET, 8, 1, &obj, tx);
 	ASSERT(error == 0);

+#ifdef HAVE_ZPL
 	/*
 	 * Create root znode.  Create minimal znode/vnode/zfsvfs
 	 * to allow zfs_mknode to work.
@ -1596,14 +1605,46 @@ zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *zplprops, dmu_tx_t *tx)
 	dmu_buf_rele(rootzp->z_dbuf, NULL);
 	rootzp->z_dbuf = NULL;
 	kmem_cache_free(znode_cache, rootzp);
+	error = zfs_create_share_dir(&zfsvfs, tx);
+#else
+	/*
+	 * Create root znode with code free of VFS dependencies
+	 */
+	obj = zap_create_norm(os, norm, DMU_OT_DIRECTORY_CONTENTS,
+	                      DMU_OT_ZNODE, sizeof (znode_phys_t), tx);
+
+	VERIFY(0 == dmu_bonus_hold(os, obj, FTAG, &db));
+	dmu_buf_will_dirty(db, tx);

 	/*
-	 * Create shares directory
+	 * Initialize the znode physical data to zero.
 	 */
+	ASSERT(db->db_size >= sizeof (znode_phys_t));
+	bzero(db->db_data, db->db_size);
+	pzp = db->db_data;

-	error = zfs_create_share_dir(&zfsvfs, tx);
+	if (USE_FUIDS(version, os))
+		pzp->zp_flags = ZFS_ARCHIVE | ZFS_AV_MODIFIED;

+	pzp->zp_size = 2; /* "." and ".." */
+	pzp->zp_links = 2;
+	pzp->zp_parent = obj;
+	pzp->zp_gen = dmu_tx_get_txg(tx);
+	pzp->zp_mode = S_IFDIR | 0755;
+	pzp->zp_flags = ZFS_ACL_TRIVIAL;
+
+	gethrestime(&now);
+
+	ZFS_TIME_ENCODE(&now, pzp->zp_crtime);
+	ZFS_TIME_ENCODE(&now, pzp->zp_ctime);
+	ZFS_TIME_ENCODE(&now, pzp->zp_atime);
+	ZFS_TIME_ENCODE(&now, pzp->zp_mtime);
+
+	error = zap_add(os, moid, ZFS_ROOT_OBJ, 8, 1, &obj, tx);
 	ASSERT(error == 0);
+
+	dmu_buf_rele(db, FTAG);
+#endif /* HAVE_ZPL */
 }

 #endif /* _KERNEL */
--- a/module/zfs/zvol.c
+++ b/module/zfs/zvol.c