Prior to revision 11314 if a user was recursively destroying
snapshots of a dataset the target dataset was not required to
exist. The zfs_secpolicy_destroy_snaps() function introduced
the security check on the target dataset, so since then if the
target dataset does not exist, the recursive destroy is not
performed. Before 11314, only a delete permission check on
the snapshot's master dataset was performed.
Steps to reproduce:
zfs create pool/a
zfs snapshot pool/a@s1
zfs destroy -r pool@s1
Therefore I suggest to fallback to the old security check, if
the target snapshot does not exist and continue with the destroy.
References to Illumos issue and patch:
- https://www.illumos.org/issues/1043
- https://www.illumos.org/attachments/217/recursive_dataset_destroy.patch
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
Moving the zil_free() cleanup to zil_close() prevents this
problem from occurring in the first place. There is a very
good description of the issue and fix in Illumus #883.
Reviewed by: Matt Ahrens <Matt.Ahrens@delphix.com>
Reviewed by: Adam Leventhal <Adam.Leventhal@delphix.com>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Garrett D'Amore <garrett@nexenta.com>
Reivewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Gordon Ross <gwr@nexenta.com>
References to Illumos issue and patch:
- https://www.illumos.org/issues/883
- https://github.com/illumos/illumos-gate/commit/c9ba2a43cb
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
Add a "REFRATIO" property, which is the compression ratio based on
data referenced. For snapshots, this is the same as COMPRESSRATIO,
but for filesystems/volumes, the COMPRESSRATIO is based on the
data "USED" (ie, includes blocks in children, but not blocks
shared with the origin).
This is needed to figure out how much space a filesystem would
use if it were not compressed (ignoring snapshots).
Reviewed by: George Wilson <George.Wilson@delphix.com>
Reviewed by: Adam Leventhal <Adam.Leventhal@delphix.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Mark Musante <Mark.Musante@oracle.com>
Reviewed by: Garrett D'Amore <garrett@nexenta.com>
Approved by: Garrett D'Amore <garrett@nexenta.com>
References to Illumos issue and patch:
- https://www.illumos.org/issues/1092
- https://github.com/illumos/illumos-gate/commit/187d6ac08a
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
Today zfs tries to allocate blocks evenly across all devices.
This means when devices are imbalanced zfs will use lots of
CPU searching for space on devices which tend to be pretty
full. It should instead fail quickly on the full LUNs and
move onto devices which have more availability.
Reviewed by: Eric Schrock <Eric.Schrock@delphix.com>
Reviewed by: Matt Ahrens <Matt.Ahrens@delphix.com>
Reviewed by: Adam Leventhal <Adam.Leventhal@delphix.com>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Approved by: Garrett D'Amore <garrett@nexenta.com>
References to Illumos issue and patch:
- https://www.illumos.org/issues/510
- https://github.com/illumos/illumos-gate/commit/5ead3ed965
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
The 'zfs get' command should be able to deal with mountpoint
as an argument. It already works with 'zfs list' command:
# zfs list /export/home/estibi
NAME USED AVAIL REFER MOUNTPOINT
rpool/export/home/estibi 1.14G 3.86G 1.14G /export/home/estibi
but it fails with 'zfs get':
# zfs get all /export/home/estibi
cannot open '/export/home/estibi': invalid dataset name
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Deano <deano@rattie.demon.co.uk>
Reviewed by: Garrett D'Amore <garrett@nexenta.com>
Approved by: Garrett D'Amore <garrett@nexenta.com>
References to Illumos issue and patch:
- https://www.illumos.org/issues/510
- https://github.com/illumos/illumos-gate/commit/5ead3ed965
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
Note that with the current ZFS code, it turns out that the vdev
cache is not helpful, and in some cases actually harmful. It
is better if we disable this. Once some time has passed, we
should actually remove this to simplify the code. For now we
just disable it by setting the zfs_vdev_cache_size to zero.
Note that Solaris 11 has made these same changes.
References to Illumos issue and patch:
- https://www.illumos.org/issues/175
- https://github.com/illumos/illumos-gate/commit/b68a40a845
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
Hypothesis about what's going on here.
At some time in the past, something, i.e. dnode_reallocate()
calls one of:
dbuf_rm_spill(dn, tx);
These will do:
dbuf_rm_spill(dnode_t *dn, dmu_tx_t *tx)
dbuf_free_range(dn, DMU_SPILL_BLKID, DMU_SPILL_BLKID, tx)
dbuf_undirty(db, tx)
Currently dbuf_undirty can leave a spill block in dn_dirty_records[],
(it having been put there previously by dbuf_dirty) and free it.
Sometime later, dbuf_sync_list trips over this reference to free'd
(and typically reused) memory.
Also, dbuf_undirty can call dnode_clear_range with a bogus
block ID. It needs to test for DMU_SPILL_BLKID, similar to
how dnode_clear_range is called in dbuf_dirty().
References to Illumos issue and patch:
- https://www.illumos.org/issues/764
- https://github.com/illumos/illumos-gate/commit/3f2366c2bb
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Mark.Maybe@oracle.com
Reviewed by: Albert Lee <trisk@nexenta.com
Approved by: Garrett D'Amore <garrett@nexenta.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
This change ensures the paths used by the provided init scripts
always reference the prefixes provided at configure time. The
@sbindir@ and @sysconfdir@ prefixes will be correctly replaced
at build time.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#336
This fixes a bug that can effect first reboot after install
using Dracut. The Dracut module didn't check the return
value from several calls to z* functions. This resulted in
"Using no pools available as root" on boot if the ZFS module
didn't auto-import the pools. It's most likely to happen on
initial restart after a fresh install & requires juggling in
the Dracut emergency holographic shell to fix.
This patch checks return codes & output from zpool list and
related functions and correctly falls into the explicit zpool
import code branch if the module didn't import the pool at load.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
For consistency with the upstream sources and the rest of the
project use hard instead of soft tabs.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Update two kmem_alloc()'s in dbuf_dirty() to use KM_PUSHPAGE.
Because these functions are called from txg_sync_thread we
must ensure they don't reenter the zfs filesystem code via
the .writepage callback. This would result in a deadlock.
This deadlock is rare and has only been observed once under
an abusive mmap() write workload.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The bootfs example in the dracut documentation was sightly incorrect
because it lacked the trailing required pool argument. Add it.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The latest kernels no longer define AUTOCONF_INCLUDED which was
being used to detect the new style autoconf.h kernel configure
options. This results in the CONFIG_* checks always failing
incorrectly for newer kernels.
The fix for this is a simplification of the testing method.
Rather than attempting to explicitly include to renamed config
header. It is simpler to unconditionally include <linux/module.h>
which must pick up the correctly named header.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#320
Long, long, long ago when the effort to port ZFS was begun
the zfs_create_fs() function was heavily modified to remove
all of its VFS dependencies. This allowed Lustre to use
the dataset without us having to spend the time porting all
the required VFS code.
Fast-forward several years and we now have all the VFS code
in place but are still relying on the modified zfs_create_fs().
This isn't required anymore and we can now use zfs_mknode()
to create the root znode for the filesystem.
This commit reverts the contents of zfs_create_fs() to largely
match the upstream OpenSolaris code. There have been minor
modifications to accomidate the Linux VFS but that is all.
This code fixes issue #116 by bootstraping enough of the VFS
data structures so we can rely on zfs_mknode() to create the
root directory. This ensures it is created properly with
support for system attributes. Previously it wasn't which
is why it behaved differently that all other directories
when modified.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#116
Newly created files were always being created with the fsuid/fsgid
in the current users credentials. This is correct except in the
case when the parent directory sets the 'setgit' bit. In this
case according to posix the newly created file/directory should
inherit the gid of the parent directory. Additionally, in the
case of a subdirectory it should also inherit the 'setgit' bit.
Finally, this commit performs a little cleanup of the vattr_t
initialization by moving it to a common helper function.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#262
When running 'make install' without DESTDIR set the module install
rules would mistakenly destroy the 'modules.*' files for ALL of
your installed kernels. This could lead to a non-functional system
for the alternate kernels because 'depmod -a' will only be run for
the kernel which was compiled against. This issue would not impact
anyone using the 'make <deb|rpm|pkg>' build targets to build and
install packages.
The fix for this issue is to only remove extraneous build products
when DESTDIR is set. This almost exclusively indicates we are
building packages and installed the build products in to a temporary
staging location. Additionally, limit the removal the unneeded
build products to the target kernel version.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#328
Disable the normal reclaim path for zpl_putpage(). This ensures that
all memory allocations under this call path will never enter direct
reclaim. If this were to happen the VM might try to write out
additional pages by calling zpl_putpage() again resulting in a
deadlock.
This sitution is typically handled in Linux by marking each offending
allocation GFP_NOFS. However, since much of the code used is common
it makes more sense to use PF_MEMALLOC to flag the entire call tree.
Alternately, the code could be updated to pass the needed allocation
flags but that's a more invasive change.
The following example of the above described deadlock was triggered
by test 074 in the xfstest suite.
Call Trace:
[<ffffffff814dcdb2>] down_write+0x32/0x40
[<ffffffffa05af6e4>] dnode_new_blkid+0x94/0x2d0 [zfs]
[<ffffffffa0597d66>] dbuf_dirty+0x556/0x750 [zfs]
[<ffffffffa05987d1>] dmu_buf_will_dirty+0x81/0xd0 [zfs]
[<ffffffffa059ee70>] dmu_write+0x90/0x170 [zfs]
[<ffffffffa0611afe>] zfs_putpage+0x2ce/0x360 [zfs]
[<ffffffffa062875e>] zpl_putpage+0x1e/0x60 [zfs]
[<ffffffffa06287b2>] zpl_writepage+0x12/0x20 [zfs]
[<ffffffff8115f907>] writeout+0xa7/0xd0
[<ffffffff8115fa6b>] move_to_new_page+0x13b/0x170
[<ffffffff8115fed4>] migrate_pages+0x434/0x4c0
[<ffffffff811559ab>] compact_zone+0x4fb/0x780
[<ffffffff81155ed1>] compact_zone_order+0xa1/0xe0
[<ffffffff8115602c>] try_to_compact_pages+0x11c/0x190
[<ffffffff811200bb>] __alloc_pages_nodemask+0x5eb/0x8b0
[<ffffffff8115464a>] alloc_pages_current+0xaa/0x110
[<ffffffff8111e36e>] __get_free_pages+0xe/0x50
[<ffffffffa03f0e2f>] kv_alloc+0x3f/0xb0 [spl]
[<ffffffffa03f11d9>] spl_kmem_cache_alloc+0x339/0x660 [spl]
[<ffffffffa05950b3>] dbuf_create+0x43/0x370 [zfs]
[<ffffffffa0596fb1>] __dbuf_hold_impl+0x241/0x480 [zfs]
[<ffffffffa0597276>] dbuf_hold_impl+0x86/0xc0 [zfs]
[<ffffffffa05977ff>] dbuf_hold_level+0x1f/0x30 [zfs]
[<ffffffffa05a9dde>] dmu_tx_check_ioerr+0x4e/0x110 [zfs]
[<ffffffffa05aa1f9>] dmu_tx_count_write+0x359/0x6f0 [zfs]
[<ffffffffa05aa5df>] dmu_tx_hold_write+0x4f/0x70 [zfs]
[<ffffffffa0611a6d>] zfs_putpage+0x23d/0x360 [zfs]
[<ffffffffa062875e>] zpl_putpage+0x1e/0x60 [zfs]
[<ffffffff811221f9>] write_cache_pages+0x1c9/0x4a0
[<ffffffffa0628738>] zpl_writepages+0x18/0x20 [zfs]
[<ffffffff81122521>] do_writepages+0x21/0x40
[<ffffffff8119bbbd>] writeback_single_inode+0xdd/0x2c0
[<ffffffff8119bfbe>] writeback_sb_inodes+0xce/0x180
[<ffffffff8119c11b>] writeback_inodes_wb+0xab/0x1b0
[<ffffffff8119c4bb>] wb_writeback+0x29b/0x3f0
[<ffffffff8119c6cb>] wb_do_writeback+0xbb/0x240
[<ffffffff811308ea>] bdi_forker_task+0x6a/0x310
[<ffffffff8108ddf6>] kthread+0x96/0xa0
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#327
When modifing overlapping regions of a file using mmap(2) and
write(2)/read(2) it is possible to deadlock due to a lock inversion.
The zfs_write() and zfs_read() hooks first take the zfs range lock
and then lock the individual pages. Conversely, when using mmap'ed
I/O the zpl_writepage() hook is called with the individual page
locks already taken and then zfs_putpage() takes the zfs range lock.
The most straight forward fix is to simply not take the zfs range
lock in the mmap(2) case. The individual pages will still be locked
thus serializing access. Updating the same region of a file with
write(2) and mmap(2) has always been a dodgy thing to do. This change
at a minimum ensures we don't deadlock and is consistent with the
existing Linux semantics enforced by the VFS.
This isn't an issue under Solaris because the only range locking
performed will be with the zfs range locks. It's up to each filesystem
to perform its own file locking. Under Linux the VFS provides many
of these services.
It may be possible/desirable at a latter date to entirely dump the
existing zfs range locking and rely on the Linux VFS page locks.
However, for now its safest to perform both layers of locking until
zfs is more tightly integrated with the page cache.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #302
The following supported options were missing from the zpool.8
man page. The OpenSolaris man pages originally used were simply
out of date with the code.
zpool import
-F Recovery mode
-m Allow missing log devices
-N Import but don't mount
-n Determine if recoverable but don't do it
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This commit fixes a regression which was accidentally introduced by
the Linux 2.6.39 compatibility chanages. As part of these changes
instead of holding an active reference on the namepsace (which is
no longer posible) a reference is taken on the super block. This
reference ensures the super block remains valid while it is in use.
To handle the unlikely race condition of the filesystem being
unmounted concurrently with the start of a 'zfs send/recv' the
code was updated to only take the super block reference when there
was an existing reference. This indicates that the filesystem is
active and in use.
Unfortunately, in the 'zfs recv' case this is not the case. The
newly created dataset will not have a super block without an
active reference which results in the 'dataset is busy' error.
The most straight forward fix for this is to simply update the
code to always take the reference even when it's zero. This
may expose us to very very unlikely concurrent umount/send/recv
case but the consequences of that are minor.
Closes#319
Unlike most other Linux distributions archlinux installs its
init scripts in /etc/rc.d insead of /etc/init.d. This commit
provides an archlinux rc.d script for zfs and extends the
build infrastructure to ensure it get's installed in the
correct place.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#322
There is at most a factor of 3x performance improvement to be
had by using the Linux generic_fillattr() helper. However, to
use it safely we need to ensure the values in a cached inode
are kept rigerously up to date. Unfortunately, this isn't
the case for the blksize, blocks, and atime fields. At the
moment the authoritative values are still stored in the znode.
This patch introduces an optimized zfs_getattr_fast() call.
The idea is to use the up to date values from the inode and
the blksize, block, and atime fields from the znode. At some
latter date we should be able to strictly use the inode values
and further improve performance.
The remaining overhead in the zfs_getattr_fast() call can be
attributed to having to take the znode mutex. This overhead is
unavoidable until the inode is kept strictly up to date. The
the careful reader will notice the we do not use the customary
ZFS_ENTER()/ZFS_EXIT() macros. These macro's are designed to
ensure the filesystem is not torn down in the middle of an
operation. However, in this case the VFS is holding a
reference on the active inode so we know this is impossible.
=================== Performance Tests ========================
This test calls the fstat(2) system call 10,000,000 times on
an open file description in a tight loop. The test results
show the zfs stat(2) performance is now only 22% slower than
ext4. This is a 2.5x improvement and there is a clear long
term plan to get to parity with ext4.
filesystem | test-1 test-2 test-3 | average | times-ext4
--------------+-------------------------+---------+-----------
ext4 | 7.785s 7.899s 7.284s | 7.656s | 1.000x
zfs-0.6.0-rc4 | 24.052s 22.531s 23.857s | 23.480s | 3.066x
zfs-faststat | 9.224s 9.398s 9.485s | 9.369s | 1.223x
The second test is to run 'du' of a copy of the /usr tree
which contains 110514 files. The test is run multiple times
both using both a cold cache (/proc/sys/vm/drop_caches) and
a hot cache. As expected this change signigicantly improved
the zfs hot cache performance and doesn't quite bring zfs to
parity with ext4.
A little surprisingly the zfs cold cache performance is better
than ext4. This can probably be attributed to the zfs allocation
policy of co-locating all the meta data on disk which minimizes
seek times. By default the ext4 allocator will spread the data
over the entire disk only co-locating each directory.
filesystem | cold | hot
--------------+---------+--------
ext4 | 13.318s | 1.040s
zfs-0.6.0-rc4 | 4.982s | 1.762s
zfs-faststat | 4.933s | 1.345s
The performance of the L2ARC can be tweaked by a number of tunables, which
may be necessary for different workloads:
l2arc_write_max max write bytes per interval
l2arc_write_boost extra write bytes during device warmup
l2arc_noprefetch skip caching prefetched buffers
l2arc_headroom number of max device writes to precache
l2arc_feed_secs seconds between L2ARC writing
l2arc_feed_min_ms min feed interval in milliseconds
l2arc_feed_again turbo L2ARC warmup
l2arc_norw no reads during writes
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#316
This caused problems on upgrade using RPM:
* The new version will run chkconfig --add, which has no effect
since the service was already added.
* The old version will run chkconfig --del, which caused zfs
service removal.
Only run "chkconfig --del" on complete uninstall, by checking
the value of "$1" to %preun, which will be "0" on uninstall,
and "1" on upgrade.
http://www.rpm.org/max-rpm/s1-rpm-inside-scripts.html
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#314
Unfortunately, ztest is hard coded to export the zdb utility to
be installed in a certain location. When the packaging was updated
to install zdb in /sbin/ ztest was broken. To fix this I'm updating
ztest to check both common install paths.
The zfs libraries were never properly versioned. Since the API has
remained static for quite some time this we never an issue. However,
going forward they should be versioned. This commit versions all
of the libraries to 1.0.0. From here on out this version must be
updated to reflect changes to the library.
The relevant init scripts were updated so as to automatically share
ZFS datasets using "zfs share -a" at boot time.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The remaining code that is guarded by HAVE_SHARE ifdefs is related to the
.zfs/shares functionality which is currently not available on Linux.
On Solaris the .zfs/shares directory can be used to set permissions for
SMB shares.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Drop usage of dlopen/dlsym for libshare. There is no need to do
this because the zfs packages provide libshare. Unlike on Solaris
we are guaranteed it will be available.
This avoids possible problems with hardcoding the libshare path in
the code (e.g. when users specify a different install path via
configure options). It additionally simplifies the code which is
good for maintainability.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The sharenfs and sharesmb properties depend on the libshare library
to export datasets via NFS and SMB. This commit implements the base
libshare functionality as well as support for managing NFS shares.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add documentation for Dracut and the initramfs process. This includes
detailing the basic boot process and options available.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Update Dracut module for Dracut-010 and fix race conditions that
caused boot to fail on MP systems. Add support for zfs_force flag
and parsing of spl_hostid from kernel command line.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
* Update paths to zpool/zfs tools,
* Log less for non-error conditions,
* Don't be fatal if umount fails at shutdown -- final init remount
will take care of it if /usr or / are in use
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The backticks would cause the output of the zfs commands
to be evaluated as input for the if construct rather than
their exit status.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The zpool sub-commands like iostat, list, and status should
display consistent message when a given pool is unavailable or
no pool is present. This change unifies the default behavior
as follows:
root@prasad:~# ./zpool list 1 2
no pools available
no pools available
root@prasad:~# ./zpool iostat 1 2
no pools available
no pools available
root@prasad:~# ./zpool status 1 2
no pools available
no pools available
root@prasad:~# ./zpool list tan 1 2
cannot open 'tan': no such pool
root@prasad:~# ./zpool iostat tan 1 2
cannot open 'tan': no such pool
root@prasad:~# ./zpool status tan 1 2
cannot open 'tan': no such pool
Reported-by: Rajshree Thorat <rthorat@stec-inc.com>
Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#306
If rc_parallel="YES" zfs starts before localmount, which leads
to "No such file or directory" error on systems with /usr on a
separate partition.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Under Linux you may only disable USER xattrs. The SECURITY,
SYSTEM, and TRUSTED xattr namespaces must always be available
if xattrs are supported by the filesystem. The enforcement
of USER xattrs is performed in the zpl_xattr_user_* handlers.
Under Solaris there is only a single xattr namespace which
is managed globally.
The lib/libspl/include/libgen.h header file was being mistakenly
left out of the 'make dist' tarball. It just happens this doesn't
cause a build failure when creating packages because the system
libgen/h is included instead. This simply results in the following
warning due to the missing forward declaration of mkdirp().
../../lib/libzfs/libzfs_mount.c:417:3: warning: implicit declaration
of function 'mkdirp' [-Wimplicit-function-declaration]
The Linux kernel already has support for mandatory locking. This
change just replaces the Solaris mandatory locking calls with the
Linux equivilants. In fact, it looks like this code could be
removed entirely because this checking is already done generically
in the Linux VFS. However, for now we'll leave it in place even
if it is redundant just in case we missed something.
The original patch to update the code to support mandatory locking
was done by Rohan Puri. This patch is an updated version which is
compatible with the previous mount option handling changes.
Original-Patch-by: Rohan Puri <rohan.puri15@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#222Closes#253
The .get_sb callback has been replaced by a .mount callback
in the file_system_type structure. When using the new
interface the caller must now use the mount_nodev() helper.
Unfortunately, the new interface no longer passes the vfsmount
down to the zfs layers. This poses a problem for the existing
implementation because we currently save this pointer in the
super block for latter use. It provides our only entry point
in to the namespace layer for manipulating certain mount options.
This needed to be done originally to allow commands like
'zfs set atime=off tank' to work properly. It also allowed me
to keep more of the original Solaris code unmodified. Under
Solaris there is a 1-to-1 mapping between a mount point and a
file system so this is a fairly natural thing to do. However,
under Linux they many be multiple entries in the namespace
which reference the same filesystem. Thus keeping a back
reference from the filesystem to the namespace is complicated.
Rather than introduce some ugly hack to get the vfsmount and
continue as before. I'm leveraging this API change to update
the ZFS code to do things in a more natural way for Linux.
This has the upside that is resolves the compatibility issue
for the long term and fixes several other minor bugs which
have been reported.
This commit updates the code to remove this vfsmount back
reference entirely. All modifications to filesystem mount
options are now passed in to the kernel via a '-o remount'.
This is the expected Linux mechanism and allows the namespace
to properly handle any options which apply to it before passing
them on to the file system itself.
Aside from fixing the compatibility issue, removing the
vfsmount has had the benefit of simplifying the code. This
change which fairly involved has turned out nicely.
Closes#246Closes#217Closes#187Closes#248Closes#231
The security_inode_init_security() function now takes an additional
qstr argument which must be passed in from the dentry if available.
Passing a NULL is safe when no qstr is available the relevant
security checks will just be skipped.
Closes#246Closes#217Closes#187