Initial Linux ZFS GIT Repo
This commit is contained in:
commit
34dc7c2f25
|
@ -0,0 +1,4 @@
|
|||
Brian Behlendorf <behlendorf1@llnl.gov>,
|
||||
Herb Wartens <wartens2@llnl.gov>,
|
||||
Jim Garlick <garlick@llnl.gov>,
|
||||
Ricardo M. Correia <Ricardo.M.Correia@sun.com>
|
|
@ -0,0 +1,114 @@
|
|||
2008-11-19 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* : Tag zfs-0.4.0
|
||||
|
||||
* : ZFS project migrated from Subversion which leveraged a
|
||||
quilt based patch stack to Git and a TopGit managed patch
|
||||
stack. The new method treats all patches as Git branches
|
||||
which can be more easily shared for distributed development.
|
||||
Consult the top level GIT file for detailed information on
|
||||
how to properly develop for this package using Git+TopGit.
|
||||
|
||||
2008-11-12 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* : Tag zfs-0.3.4
|
||||
|
||||
* zfs-07-create-dev-zfs.patch:
|
||||
Ricardo M. Correia <Ricardo.M.Correia@sun.com>
|
||||
- Make libzfs create /dev/zfs if it doesn't exist.
|
||||
|
||||
* zfs-05-check-zvol-size.patch:
|
||||
Ricardo M. Correia <Ricardo.M.Correia@sun.com>
|
||||
- Properly check zvol size under Linux.
|
||||
|
||||
* zfs-04-no-openat-fdopendir.patch:
|
||||
Ricardo M. Correia <Ricardo.M.Correia@sun.com>
|
||||
- Do not use openat() and fdopendir() since they are not available
|
||||
on older systems.
|
||||
|
||||
* zfs-03-fix-bio-sync.patch:
|
||||
Ricardo M. Correia <Ricardo.M.Correia@sun.com>
|
||||
- Fix memory corruption in RHEL4 due to synchronous IO becoming
|
||||
asynchronous.
|
||||
|
||||
2008-11-06 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* zfs-02-zpios-fix-stuck-thread-memleak.patch:
|
||||
Ricardo M. Correia <Ricardo.M.Correia@sun.com>
|
||||
- Fix stuck threads and memory leaks when errors occur while writing.
|
||||
|
||||
* zfs-01-zpios-arg-corruption.patch:
|
||||
Ricardo M. Correia <Ricardo.M.Correia@sun.com>
|
||||
- Fix zpios cmd line argument corruption problem.
|
||||
|
||||
* zfs-00-minor-fixes.patch:
|
||||
Ricardo M. Correia <Ricardo.M.Correia@sun.com>
|
||||
- Minor build system improvements
|
||||
- Minor script improvements
|
||||
- Create a full copy and not a link tree with quilt
|
||||
- KPIOS_MAJOR changed from 231 to 232
|
||||
- BIO_RW_BARRIER flag removed from IO request
|
||||
|
||||
2008-06-30 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* : Tag zfs-0.3.3
|
||||
|
||||
* : Minor script updates and tweaks to be compatible with
|
||||
the latest version of the SPL.
|
||||
|
||||
2008-06-13 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* vdev_disk.diff: Replace vdev_disk implementation which was
|
||||
based on the kmalloc'ed logical address space with a version
|
||||
which works with vmalloc'ed memory in the virtual address space.
|
||||
This was done to support the new SPL slab implementation which
|
||||
is based on virtual addresses to avoid the need for contigeously
|
||||
allocated memory.
|
||||
|
||||
2008-06-05 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* arc-vm-integration.diff: Reduce maximum default arc memory
|
||||
usage to 1/4 of total system memory. Because all the bulk data
|
||||
is still allocated on the slab memory fragmentation is a serious
|
||||
concern. To address this in the short term we simply need to
|
||||
leave lots of free memory.
|
||||
|
||||
* fix-stack.diff: First step towards reducing stack usage so
|
||||
we can run the full ZFS stack using a stock kernel.
|
||||
|
||||
2008-06-04 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* : Tag zfs-0.3.2
|
||||
|
||||
* : Extensive improvements to the build system to detect kernel
|
||||
API changes so we can flexibly build with a wider range of kernel
|
||||
versions. The code has now been testing with the 2.6.18-32chaos
|
||||
and 2.6.25.3-18.fc9 kernels, however we should also be compatible
|
||||
with other kernels in the range of 2.6.18-2.6.25. The only
|
||||
remaining issue preventing us from running with a stock
|
||||
kernel is ZFS stack usage.
|
||||
|
||||
2008-05-21 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* : Tag zfs-0.3.1
|
||||
|
||||
* : License headers including URCL added for release.
|
||||
|
||||
2008-05-21 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* : Tag zfs-0.3.0
|
||||
|
||||
* configure.ac: Improved autotools support and configurable debug.
|
||||
|
||||
2008-05-15 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* : Updating original ZFS sources to build 89 which
|
||||
includes the new write throttling changes plus support
|
||||
for using ZFS as your root device. Neither of which
|
||||
will work exactly right without some more work but this
|
||||
gets us much closers to the latest source.
|
||||
|
||||
|
||||
2008-02-28 Brian Behlendorf <behlendorf1@llnl.gov>
|
||||
|
||||
* : First attempt based on SPL module and zfs-lustre sources
|
|
@ -0,0 +1,22 @@
|
|||
This notice is required to be provided under our contract with the
|
||||
U.S. Department of Energy (DOE). This work was produced at the
|
||||
Lawrence Livermore National Laboratory under Contract with the DOE.
|
||||
|
||||
Neither the United States Government nor the Lawrence Livermore National
|
||||
Security, LLC. nor any of their employees, makes any warranty, express
|
||||
or implied, or assumes any liability or responsibility for the accuracy,
|
||||
completeness, or usefulness of any information, apparatus, product,
|
||||
or process disclosed, or represents that its use would not infringe
|
||||
privately-owned rights.
|
||||
|
||||
Also, reference herein to any specific commercial products, process,
|
||||
or services by trade name, trademark, manufacturer or otherwise does
|
||||
not necessarily constitute or imply its endorsement, recommendation,
|
||||
or favoring by the United States Government or the Lawrence Livermore
|
||||
National Security, LLC. The views and opinions of authors expressed
|
||||
herein do not necessarily state or reflect those of the United States
|
||||
Government or the Lawrence Livermore National Security, LLC., and
|
||||
shall not be used for advertising or product endorsement purposes.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification are specified in the file "COPYING".
|
|
@ -0,0 +1,162 @@
|
|||
=========================== WHY USE GIT+TOPGIT? ==========================
|
||||
|
||||
Three major concerns were on our mind when setting up this project.
|
||||
|
||||
o First we needed to structure the project in such a way that it would be
|
||||
easy to rebase all of our changes on the latest official ZFS release
|
||||
from Sun. We absolutely need to be able to benefit from the upstream
|
||||
improvements and not get locked in to an old version of the code base.
|
||||
|
||||
o Secondly, we wanted to be able to easily manage our changes in terms
|
||||
of a patch stack. This allows us to easily isolate specific changes
|
||||
and push them upstream for inclusion. It also allows us to easily
|
||||
update or drop specific changes based on what occurs upstream.
|
||||
|
||||
o Thirdly we needed our DVCS to be integrated with the management of this
|
||||
patch stack. We have tried other methods in the past such as SVN+Quilt
|
||||
but have found managing the patch stack becomes cumbersome. By using
|
||||
Git+TopGit to more tightly integrate our patch stack in to the repo
|
||||
we expect several benefits. One of the most important will be the
|
||||
ability to easily work on the patch stack with a distributed developer
|
||||
team, additionally the repo can track patch history, and we can utilize
|
||||
Git to merge patches and resolve conflicts.
|
||||
|
||||
TopGit is designed to specifically address these concerns by providing
|
||||
tools to simplify the handling of large numbers of interdependent topic
|
||||
branches. When using a TopGit aware repo every topic branch represents
|
||||
a 'patch' and that branch references its dependent branches. The union
|
||||
of all these branches is your final source base.
|
||||
|
||||
========================= SETTING UP GIT+TOPGIT ==========================
|
||||
|
||||
First off you need to install a Git package on your system. For my
|
||||
purposes I have been working on a RHEL5 system with git version 1.5.4.5
|
||||
installed and it has been working well. You will also need to go get
|
||||
the latest version of TopGit which likely is not packaged nicely so you
|
||||
will need to build it from source. You can use Git to clone TopGit
|
||||
from the official site here and your all set:
|
||||
|
||||
> git clone http://repo.or.cz/w/topgit.git
|
||||
> make
|
||||
> make install # Default installs to $(HOME)
|
||||
|
||||
========================== TOPGIT AND ZFS ================================
|
||||
|
||||
One you have Git and TopGit installed you will want to clone a copy of
|
||||
the Linux ZFS repo. While this project does not yet have a public home
|
||||
it hopefully will some day. In the meanwhile if you have VPN access to
|
||||
LLNL you can clone the latest official repo here. Cloning a TopGit
|
||||
controlled repo is very similar to cloning a normal Git repo, but you
|
||||
need to remember to use 'tg remote' to populate all topic branches.
|
||||
|
||||
> git clone http://eris.llnl.gov/git/zfs.git zfs
|
||||
> cd zfs
|
||||
> tg remote --populate origin
|
||||
|
||||
Now that you have the Linux ZFS repo the first thing you will probably
|
||||
want to do is have a look at all the topic branches. TopGit provides
|
||||
a summary command which shows all the branches and a brief summary for
|
||||
each branch obtained from the .topmsg files.
|
||||
|
||||
> tg summary
|
||||
0 t/LAST [PATCH] LAST
|
||||
t/feature-commit-cb [PATCH] zfs commit callbacks
|
||||
t/fix-clock-wrap [PATCH] fix clock wrap
|
||||
t/fix-dnode-cons [PATCH] fix dnode constructor
|
||||
...
|
||||
|
||||
By convention all TopGit branches are prefixed with 't/', and the Linux
|
||||
ZFS repo also introduces the convention that the top most development
|
||||
branch be called 't/LAST". This provides a consistent label to be used
|
||||
when you need to reference the branch which contains the union of all
|
||||
topic branches.
|
||||
|
||||
One thing you may also notice about the 'tg summary' command is it does
|
||||
not show the branches in dependent order. While this project only expresses
|
||||
a single dependency per branch TopGit implements dependencies as a DAC just
|
||||
like Git. To see the dependencies you will need to use the --graphviz
|
||||
option and pipe the result to dot for display. The following command while
|
||||
long works fairly well for me. Longer term it would be nice to update this
|
||||
option to use a preferred config options stored in the repo if they exist.
|
||||
|
||||
> tg summary --graphviz | dot -Txlib -Nfontsize=8 -Eminlen=0.01 \
|
||||
-Grankdir=LR -Nheight=0.3 -Nwidth=2 -Nfixedsize=true
|
||||
|
||||
========================= UPDATING A TOPIC BRANCH ========================
|
||||
|
||||
Updating a topic branch in TopGit is a pretty straight forward but there
|
||||
are a few rules you need to be aware of. The basic process involves
|
||||
checking out the relevant topic branch where the changes need to be made,
|
||||
making the changes, committing the changes to the branch and then merging
|
||||
those changes in to dependent branches. TopGit provides some tools to make
|
||||
this pretty easy, although it may be a little sluggish. Here is an example:
|
||||
|
||||
> git checkout t/feature-commit-cb # Checkout the proper branch
|
||||
> ...update branch... # Update the branch
|
||||
> git commit -a # Commit your changes
|
||||
> git checkout t/LAST # Checkout the LAST branch
|
||||
> tg update # Recursively merge in new branch
|
||||
|
||||
Assuming you change does not introduce any conflicts your done. All branches
|
||||
were dependent on your change will have had the changed merged in. If your
|
||||
change introduced a conflict you will need to resolve the conflict and then
|
||||
continue on with the update.
|
||||
|
||||
========================== ADDING A TOPIC BRANCH =========================
|
||||
|
||||
Adding a topic branch in TopGit is a little more complicated. When adding
|
||||
a new branch to the end of the patch graph things are pretty easy and TopGit
|
||||
does all the work. However, I expect out common case to be adding patches
|
||||
to the middle of the graph. TopGit will allow you to do this but you must
|
||||
be careful to manually update the dependency information in the .topdeps
|
||||
file.
|
||||
|
||||
> git co t/existing-topic-branch # Checkout the branch to add after
|
||||
> tg create t/new-topic-branch # Create a new topic branch
|
||||
> ...update .topmsg... # Update the branch message
|
||||
> ...create patch... # Update with your changes
|
||||
> git commit -a # Commit your changes
|
||||
> git co t/dependent-topic-branch # Checkout dependent branch
|
||||
> ...update .topdeps... # Manually update dependencies
|
||||
> git commit -a # Commit your changes
|
||||
> tg update # TopGit update
|
||||
> git checkout t/LAST # Checkout the LAST branch
|
||||
> tg update # Recursively merge in new branch
|
||||
|
||||
========================= REMOVING A TOPIC BRANCH ========================
|
||||
|
||||
Removing a topic branch in TopGit is also currently not very easy. To remove
|
||||
a dependent branch the basic process is to commit a patch which reverts all
|
||||
changes on the branch. Then that reversion must be merged in to all dependent
|
||||
branches, the dependencies manually updated and finally the branch removed.
|
||||
If the branch is not empty you will not be able to remove it.
|
||||
|
||||
> git co t/del-topic-branch # Checkout the branch to delete
|
||||
> tg patch | patch -R -p1 # Revert all branch changes
|
||||
> git commit -a # Commit your changes
|
||||
> git checkout t/LAST # Checkout the LAST branch
|
||||
> tg update # Recursively merge revert
|
||||
> git co t/dependent-topic-branch # Checkout dependent branch
|
||||
> ...update .topdeps... # Manually update dependencies
|
||||
> git commit -a # Commit your changes
|
||||
> tg delete t/del-topic-branch # Delete empty topic branch
|
||||
|
||||
============================ TOPGIT TODO =================================
|
||||
|
||||
TopGit is still a young package which seems to be under active development
|
||||
by its author. It provides the minimum set of commands needed but there
|
||||
are clearly areas which simply have not yet been implemented. My short
|
||||
list of features includes:
|
||||
|
||||
o 'tg summary --deps', option to display a text version of the topic
|
||||
branch dependency DAC.
|
||||
|
||||
o 'tg depend list', list all topic branch dependencies.
|
||||
|
||||
o 'tg depend delete', cleanly remove a topic branch dependency.
|
||||
|
||||
o 'tg create', cleanly insert a topic branch in the middle
|
||||
of the graph and properly take care updating all dependencies.
|
||||
|
||||
o 'tg delete', cleanly delete a topic branch in the middle
|
||||
of the graph and properly take care updating all dependencies.
|
|
@ -0,0 +1,6 @@
|
|||
Meta: 1
|
||||
Name: zfs
|
||||
Branch: 1.0
|
||||
Version: 0.3.4
|
||||
Release: 1
|
||||
Release-Tags: relext
|
|
@ -0,0 +1,25 @@
|
|||
AUTOMAKE_OPTIONS = foreign dist-zip
|
||||
|
||||
SUBDIRS = doc scripts $(BUILDDIR)
|
||||
CONFIG_CLEAN_FILES = aclocal.m4 config.guess config.sub
|
||||
CONFIG_CLEAN_FILES += depcomp missing mkinstalldirs
|
||||
EXTRA_DIST = autogen.sh
|
||||
|
||||
.PHONY: quilt
|
||||
quilt: .quilt-$(BUILDDIR)
|
||||
autogen: .autogen-$(BUILDDIR)
|
||||
config: .config-$(BUILDDIR)
|
||||
.quilt-$(BUILDDIR):
|
||||
./scripts/quilt.sh -p $(NAME) -b $(BUILDDIR) -s $(SERIESFILE) -d $(PATCHDIR)
|
||||
echo $(BUILDDIR) >$@
|
||||
|
||||
unquilt:
|
||||
rm -rf $(BUILDDIR)
|
||||
rm -f .quilt-$(BUILDDIR)
|
||||
|
||||
clean-generic:
|
||||
|
||||
distclean: unquilt
|
||||
|
||||
rpms: dist Makefile
|
||||
rpmbuild -ta $(distdir).tar.gz
|
|
@ -0,0 +1,74 @@
|
|||
============================ ZFS KERNEL BUILD ============================
|
||||
|
||||
1) Build the SPL (Solaris Porting Layer) module which is designed to
|
||||
provide many Solaris APIs in the Linux kernel which are needed
|
||||
by ZFS. To build the SPL:
|
||||
|
||||
tar -xzf spl-x.y.z.tgz
|
||||
cd spl-x.y.z
|
||||
./configure --with-linux=<kernel src>
|
||||
make
|
||||
make check <as root>
|
||||
|
||||
2) Build ZFS, this port is based on build 89 of ZFS from OpenSolaris.
|
||||
You will need to have both the kernel and SPL source available.
|
||||
To build ZFS for use as a Linux kernel module (default):
|
||||
|
||||
tar -xzf zfs-x.y.z.tgz
|
||||
cd zfs-x.y.z
|
||||
./configure --with-linux=<kernel src> \
|
||||
--with-spl=<spl src>
|
||||
make
|
||||
make check <as root>
|
||||
|
||||
========================= ZFS USER LIBRARY BUILD =========================
|
||||
|
||||
1) Build ZFS, this port is based on build 89 of ZFS from OpenSolaris.
|
||||
To build ZFS as a userspace library:
|
||||
|
||||
tar -xzf zfs-x.y.z.tgz
|
||||
cd zfs-x.y.z
|
||||
./configure --zfsconfig=user
|
||||
make
|
||||
make check <as root>
|
||||
|
||||
============================ ZFS LUSTRE BUILD ============================
|
||||
|
||||
1) Build the SPL (Solaris Porting Layer) module which is designed to
|
||||
provide many Solaris APIs in the Linux kernel which are needed
|
||||
by ZFS. To build the SPL:
|
||||
|
||||
tar -xzf spl-x.y.z.tgz
|
||||
cd spl-x.y.z
|
||||
./configure --with-linux=<kernel src>
|
||||
make
|
||||
make check <as root>
|
||||
|
||||
2) Build ZFS, this port is based on build 89 of ZFS from OpenSolaris.
|
||||
To build ZFS as a userspace library for use by a Lustre filesystem:
|
||||
|
||||
tar -xzf zfs-x.y.z.tgz
|
||||
cd zfs-x.y.z
|
||||
./configure --zfsconfig=lustre \
|
||||
--with-linux=<kernel src> \
|
||||
--with-spl=<spl src>
|
||||
make
|
||||
make check <as root>
|
||||
|
||||
3) Provided is an in-kernel test application called kpios which can be
|
||||
used to simulate a Lustre IO load. It may be used as a stress test
|
||||
or as a performance to measure your configuration. To simplify testing
|
||||
there are scripts provided in the scripts/ directory. A single test
|
||||
can be run as follows:
|
||||
|
||||
WARNING: You MUST update DEVICES in the create-zpool.sh script
|
||||
to reference the devices you wish to use.
|
||||
|
||||
cd scripts
|
||||
./load-zfs.sh # Load the ZFS/SPL module stack
|
||||
./create-zpool.sh # Modify DEVICES to list your zpool devices
|
||||
./zpios.sh # Modify for your particular kpios test
|
||||
./unload-zfs.sh # Unload the ZFS/SPL module stack
|
||||
|
||||
Enjoy,
|
||||
Brian Behlendorf <behlendorf1@llnl.gov>
|
|
@ -0,0 +1,30 @@
|
|||
* We may need a libefi replacement. It appears libefi is used
|
||||
to determine if the device passed to zpool is a 'whole device'
|
||||
or just a partition of a device. In the short term I think we
|
||||
can simply treat everything as a partition and be alright.
|
||||
|
||||
* We also do not have support for getting Solaris style device
|
||||
ids which is done when a zpool is setup. We may or may not
|
||||
be able to live without this, the jury is still out.
|
||||
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
* Port zvol (ZFS volume interface).
|
||||
|
||||
* Port zpl (ZFS posix interface).
|
||||
|
||||
* Port lustre fsfilt interface to use DMU.
|
||||
|
||||
* Andreas issue #1:
|
||||
"the maximum allocation DMU "blocksize" was 128kB and it would be better
|
||||
to be able to get 1MB contiguous allocations for best performance"
|
||||
|
||||
* Andreas issue #2:
|
||||
"there would need to be some work done to allow multiple operations to
|
||||
be atomic. This is needed by Lustre for object creation + LAST_ID
|
||||
updates, unlink + llog updates, etc. Conceptually this isn't very
|
||||
much work for a phase tree, but I've never looked at the ZFS code."
|
||||
|
||||
* Design and implement mechanism for viewing and modifying OST content
|
||||
from user space (by manipulating datasets/objects), possibly
|
||||
by implementing scaled down file system interface.
|
|
@ -0,0 +1 @@
|
|||
EXTRA_DIST = zfs-build.m4
|
|
@ -0,0 +1,378 @@
|
|||
AC_DEFUN([ZFS_AC_CONFIG], [
|
||||
|
||||
AC_ARG_WITH([zfs-config],
|
||||
AS_HELP_STRING([--with-config=CONFIG],
|
||||
[Config file 'kernel|user|lustre']),
|
||||
[zfsconfig="$withval"])
|
||||
|
||||
AC_MSG_CHECKING([zfs config file])
|
||||
if test -z "$zfsconfig" || test ! -r configs/$zfsconfig; then
|
||||
AC_MSG_RESULT([no])
|
||||
AC_MSG_ERROR([
|
||||
*** Please specify one of the valid config files located
|
||||
*** in ./configs/ with the '--with-zfs-config=CONFIG' option])
|
||||
fi
|
||||
|
||||
AC_MSG_RESULT([$zfsconfig]);
|
||||
. ./configs/$zfsconfig
|
||||
|
||||
TOPDIR=`/bin/pwd`
|
||||
ZFSDIR=${TOPDIR}/$BUILDDIR
|
||||
LIBDIR=$ZFSDIR/lib
|
||||
CMDDIR=$ZFSDIR/zcmd
|
||||
|
||||
AC_SUBST(TOPDIR)
|
||||
AC_SUBST(ZFSDIR)
|
||||
AC_SUBST(LIBDIR)
|
||||
AC_SUBST(CMDDIR)
|
||||
])
|
||||
|
||||
AC_DEFUN([ZFS_AC_KERNEL], [
|
||||
ver=`uname -r`
|
||||
|
||||
AC_ARG_WITH([linux],
|
||||
AS_HELP_STRING([--with-linux=PATH],
|
||||
[Path to kernel source]),
|
||||
[kernelsrc="$withval"; kernelbuild="$withval"])
|
||||
|
||||
AC_ARG_WITH(linux-obj,
|
||||
AS_HELP_STRING([--with-linux-obj=PATH],
|
||||
[Path to kernel build objects]),
|
||||
[kernelbuild="$withval"])
|
||||
|
||||
AC_MSG_CHECKING([kernel source directory])
|
||||
if test -z "$kernelsrc"; then
|
||||
kernelbuild=
|
||||
sourcelink=/lib/modules/${ver}/source
|
||||
buildlink=/lib/modules/${ver}/build
|
||||
|
||||
if test -e $sourcelink; then
|
||||
kernelsrc=`(cd $sourcelink; /bin/pwd)`
|
||||
fi
|
||||
if test -e $buildlink; then
|
||||
kernelbuild=`(cd $buildlink; /bin/pwd)`
|
||||
fi
|
||||
if test -z "$kernelsrc"; then
|
||||
kernelsrc=$kernelbuild
|
||||
fi
|
||||
if test -z "$kernelsrc" -o -z "$kernelbuild"; then
|
||||
AC_MSG_RESULT([Not found])
|
||||
AC_MSG_ERROR([
|
||||
*** Please specify the location of the kernel source
|
||||
*** with the '--with-linux=PATH' option])
|
||||
fi
|
||||
fi
|
||||
|
||||
AC_MSG_RESULT([$kernelsrc])
|
||||
AC_MSG_CHECKING([kernel build directory])
|
||||
AC_MSG_RESULT([$kernelbuild])
|
||||
|
||||
AC_MSG_CHECKING([kernel source version])
|
||||
if test -r $kernelbuild/include/linux/version.h &&
|
||||
fgrep -q UTS_RELEASE $kernelbuild/include/linux/version.h; then
|
||||
|
||||
kernsrcver=`(echo "#include <linux/version.h>";
|
||||
echo "kernsrcver=UTS_RELEASE") |
|
||||
cpp -I $kernelbuild/include |
|
||||
grep "^kernsrcver=" | cut -d \" -f 2`
|
||||
|
||||
elif test -r $kernelbuild/include/linux/utsrelease.h &&
|
||||
fgrep -q UTS_RELEASE $kernelbuild/include/linux/utsrelease.h; then
|
||||
|
||||
kernsrcver=`(echo "#include <linux/utsrelease.h>";
|
||||
echo "kernsrcver=UTS_RELEASE") |
|
||||
cpp -I $kernelbuild/include |
|
||||
grep "^kernsrcver=" | cut -d \" -f 2`
|
||||
fi
|
||||
|
||||
if test -z "$kernsrcver"; then
|
||||
AC_MSG_RESULT([Not found])
|
||||
AC_MSG_ERROR([
|
||||
*** Cannot determine the version of the linux kernel source.
|
||||
*** Please prepare the kernel before running this script])
|
||||
fi
|
||||
|
||||
AC_MSG_RESULT([$kernsrcver])
|
||||
|
||||
kmoduledir=${INSTALL_MOD_PATH}/lib/modules/$kernsrcver
|
||||
LINUX=${kernelsrc}
|
||||
LINUX_OBJ=${kernelbuild}
|
||||
|
||||
AC_SUBST(LINUX)
|
||||
AC_SUBST(LINUX_OBJ)
|
||||
AC_SUBST(kmoduledir)
|
||||
])
|
||||
|
||||
AC_DEFUN([ZFS_AC_SPL], [
|
||||
|
||||
AC_ARG_WITH([spl],
|
||||
AS_HELP_STRING([--with-spl=PATH],
|
||||
[Path to spl source]),
|
||||
[splsrc="$withval"; splbuild="$withval"])
|
||||
|
||||
AC_ARG_WITH([spl-obj],
|
||||
AS_HELP_STRING([--with-spl-obj=PATH],
|
||||
[Path to spl build objects]),
|
||||
[splbuild="$withval"])
|
||||
|
||||
|
||||
AC_MSG_CHECKING([spl source directory])
|
||||
if test -z "$splsrc"; then
|
||||
splbuild=
|
||||
sourcelink=/tmp/`whoami`/spl
|
||||
buildlink=/tmp/`whoami`/spl
|
||||
|
||||
if test -e $sourcelink; then
|
||||
splsrc=`(cd $sourcelink; /bin/pwd)`
|
||||
fi
|
||||
if test -e $buildlink; then
|
||||
splbuild=`(cd $buildlink; /bin/pwd)`
|
||||
fi
|
||||
if test -z "$splsrc"; then
|
||||
splsrc=$splbuild
|
||||
fi
|
||||
fi
|
||||
|
||||
if test -z "$splsrc" -o -z "$splbuild"; then
|
||||
sourcelink=/lib/modules/${ver}/source
|
||||
buildlink=/lib/modules/${ver}/build
|
||||
|
||||
if test -e $sourcelink; then
|
||||
splsrc=`(cd $sourcelink; /bin/pwd)`
|
||||
fi
|
||||
if test -e $buildlink; then
|
||||
splbuild=`(cd $buildlink; /bin/pwd)`
|
||||
fi
|
||||
if test -z "$splsrc"; then
|
||||
splsrc=$splbuild
|
||||
fi
|
||||
if test -z "$splsrc" -o -z "$splbuild"; then
|
||||
AC_MSG_RESULT([Not found])
|
||||
AC_MSG_ERROR([
|
||||
*** Please specify the location of the spl source
|
||||
*** with the '--with-spl=PATH' option])
|
||||
fi
|
||||
fi
|
||||
|
||||
AC_MSG_RESULT([$splsrc])
|
||||
AC_MSG_CHECKING([spl build directory])
|
||||
AC_MSG_RESULT([$splbuild])
|
||||
|
||||
AC_MSG_CHECKING([spl source version])
|
||||
if test -r $splbuild/spl_config.h &&
|
||||
fgrep -q VERSION $splbuild/spl_config.h; then
|
||||
|
||||
splsrcver=`(echo "#include <spl_config.h>";
|
||||
echo "splsrcver=VERSION") |
|
||||
cpp -I $splbuild |
|
||||
grep "^splsrcver=" | cut -d \" -f 2`
|
||||
fi
|
||||
|
||||
if test -z "$splsrcver"; then
|
||||
AC_MSG_RESULT([Not found])
|
||||
AC_MSG_ERROR([
|
||||
*** Cannot determine the version of the spl source.
|
||||
*** Please prepare the spl source before running this script])
|
||||
fi
|
||||
|
||||
AC_MSG_RESULT([$splsrcver])
|
||||
|
||||
AC_MSG_CHECKING([spl Module.symvers])
|
||||
if test -r $splbuild/modules/Module.symvers; then
|
||||
splsymvers=$splbuild/modules/Module.symvers
|
||||
elif test -r $kernelbuild/Module.symvers; then
|
||||
splsymvers=$kernelbuild/Module.symvers
|
||||
fi
|
||||
|
||||
if test -z "$splsymvers"; then
|
||||
AC_MSG_RESULT([Not found])
|
||||
AC_MSG_ERROR([
|
||||
*** Cannot find extra Module.symvers in the spl source.
|
||||
*** Please prepare the spl source before running this script])
|
||||
fi
|
||||
|
||||
AC_MSG_RESULT([$splsymvers])
|
||||
AC_SUBST(splsrc)
|
||||
AC_SUBST(splsymvers)
|
||||
])
|
||||
|
||||
AC_DEFUN([ZFS_AC_LICENSE], [
|
||||
AC_MSG_CHECKING([license])
|
||||
AC_MSG_RESULT([CDDL])
|
||||
dnl # AC_DEFINE([HAVE_GPL_ONLY_SYMBOLS], [1],
|
||||
dnl # [Define to 1 if module is licensed under the GPL])
|
||||
])
|
||||
|
||||
AC_DEFUN([ZFS_AC_DEBUG], [
|
||||
AC_MSG_CHECKING([whether debugging is enabled])
|
||||
AC_ARG_ENABLE( [debug],
|
||||
AS_HELP_STRING([--enable-debug],
|
||||
[Enable generic debug support (default off)]),
|
||||
[ case "$enableval" in
|
||||
yes) zfs_ac_debug=yes ;;
|
||||
no) zfs_ac_debug=no ;;
|
||||
*) AC_MSG_RESULT([Error!])
|
||||
AC_MSG_ERROR([Bad value "$enableval" for --enable-debug]) ;;
|
||||
esac ]
|
||||
)
|
||||
if test "$zfs_ac_debug" = yes; then
|
||||
AC_MSG_RESULT([yes])
|
||||
KERNELCPPFLAGS="${KERNELCPPFLAGS} -DDEBUG "
|
||||
HOSTCFLAGS="${HOSTCFLAGS} -DDEBUG "
|
||||
else
|
||||
AC_MSG_RESULT([no])
|
||||
AC_DEFINE([NDEBUG], [1],
|
||||
[Define to 1 to disable debug tracing])
|
||||
KERNELCPPFLAGS="${KERNELCPPFLAGS} -DNDEBUG "
|
||||
HOSTCFLAGS="${HOSTCFLAGS} -DNDEBUG "
|
||||
fi
|
||||
])
|
||||
|
||||
AC_DEFUN([ZFS_AC_SCRIPT_CONFIG], [
|
||||
SCRIPT_CONFIG=.script-config
|
||||
rm -f ${SCRIPT_CONFIG}
|
||||
echo "KERNELSRC=${LINUX}" >>${SCRIPT_CONFIG}
|
||||
echo "KERNELBUILD=${LINUX_OBJ}" >>${SCRIPT_CONFIG}
|
||||
echo "KERNELSRCVER=$kernsrcver" >>${SCRIPT_CONFIG}
|
||||
echo >>${SCRIPT_CONFIG}
|
||||
echo "SPLSRC=$splsrc" >>${SCRIPT_CONFIG}
|
||||
echo "SPLBUILD=$splbuild" >>${SCRIPT_CONFIG}
|
||||
echo "SPLSRCVER=$splsrcver" >>${SCRIPT_CONFIG}
|
||||
echo "SPLSYMVERS=$splsymvers" >>${SCRIPT_CONFIG}
|
||||
echo >>${SCRIPT_CONFIG}
|
||||
echo "ZFSSRC=${TOPDIR}/src" >>${SCRIPT_CONFIG}
|
||||
echo "ZFSBUILD=${ZFSDIR}" >>${SCRIPT_CONFIG}
|
||||
echo >>${SCRIPT_CONFIG}
|
||||
echo "TOPDIR=${TOPDIR} >>${SCRIPT_CONFIG}
|
||||
echo "LIBDIR=${LIBDIR} >>${SCRIPT_CONFIG}
|
||||
echo "CMDDIR=${CMDDIR} >>${SCRIPT_CONFIG}
|
||||
])
|
||||
|
||||
dnl #
|
||||
dnl # ZFS_LINUX_CONFTEST
|
||||
dnl #
|
||||
AC_DEFUN([ZFS_LINUX_CONFTEST], [
|
||||
cat >conftest.c <<_ACEOF
|
||||
$1
|
||||
_ACEOF
|
||||
])
|
||||
|
||||
dnl #
|
||||
dnl # ZFS_LANG_PROGRAM(C)([PROLOGUE], [BODY])
|
||||
dnl #
|
||||
m4_define([ZFS_LANG_PROGRAM], [
|
||||
$1
|
||||
int
|
||||
main (void)
|
||||
{
|
||||
dnl Do *not* indent the following line: there may be CPP directives.
|
||||
dnl Don't move the `;' right after for the same reason.
|
||||
$2
|
||||
;
|
||||
return 0;
|
||||
}
|
||||
])
|
||||
|
||||
dnl #
|
||||
dnl # ZFS_LINUX_COMPILE_IFELSE / like AC_COMPILE_IFELSE
|
||||
dnl #
|
||||
AC_DEFUN([ZFS_LINUX_COMPILE_IFELSE], [
|
||||
m4_ifvaln([$1], [ZFS_LINUX_CONFTEST([$1])])dnl
|
||||
rm -f build/conftest.o build/conftest.mod.c build/conftest.ko build/Makefile
|
||||
echo "obj-m := conftest.o" >build/Makefile
|
||||
dnl AS_IF([AC_TRY_COMMAND(cp conftest.c build && make [$2] CC="$CC" -f $PWD/build/Makefile LINUXINCLUDE="-Iinclude -include include/linux/autoconf.h" -o tmp_include_depends -o scripts -o include/config/MARKER -C $LINUX_OBJ EXTRA_CFLAGS="-Werror-implicit-function-declaration $EXTRA_KCFLAGS" $ARCH_UM SUBDIRS=$PWD/build) >/dev/null && AC_TRY_COMMAND([$3])],
|
||||
AS_IF([AC_TRY_COMMAND(cp conftest.c build && make [$2] CC="$CC" LINUXINCLUDE="-Iinclude -include include/linux/autoconf.h" -o tmp_include_depends -o scripts -o include/config/MARKER -C $LINUX_OBJ EXTRA_CFLAGS="-Werror-implicit-function-declaration $EXTRA_KCFLAGS" $ARCH_UM M=$PWD/build) >/dev/null && AC_TRY_COMMAND([$3])],
|
||||
[$4],
|
||||
[_AC_MSG_LOG_CONFTEST
|
||||
m4_ifvaln([$5],[$5])dnl])dnl
|
||||
rm -f build/conftest.o build/conftest.mod.c build/conftest.mod.o build/conftest.ko m4_ifval([$1], [build/conftest.c conftest.c])[]dnl
|
||||
])
|
||||
|
||||
dnl #
|
||||
dnl # ZFS_LINUX_TRY_COMPILE like AC_TRY_COMPILE
|
||||
dnl #
|
||||
AC_DEFUN([ZFS_LINUX_TRY_COMPILE],
|
||||
[ZFS_LINUX_COMPILE_IFELSE(
|
||||
[AC_LANG_SOURCE([ZFS_LANG_PROGRAM([[$1]], [[$2]])])],
|
||||
[modules],
|
||||
[test -s build/conftest.o],
|
||||
[$3], [$4])
|
||||
])
|
||||
|
||||
dnl #
|
||||
dnl # ZFS_LINUX_CONFIG
|
||||
dnl #
|
||||
AC_DEFUN([ZFS_LINUX_CONFIG],
|
||||
[AC_MSG_CHECKING([whether Linux was built with CONFIG_$1])
|
||||
ZFS_LINUX_TRY_COMPILE([
|
||||
#ifndef AUTOCONF_INCLUDED
|
||||
#include <linux/config.h>
|
||||
#endif
|
||||
],[
|
||||
#ifndef CONFIG_$1
|
||||
#error CONFIG_$1 not #defined
|
||||
#endif
|
||||
],[
|
||||
AC_MSG_RESULT([yes])
|
||||
$2
|
||||
],[
|
||||
AC_MSG_RESULT([no])
|
||||
$3
|
||||
])
|
||||
])
|
||||
|
||||
dnl #
|
||||
dnl # ZFS_CHECK_SYMBOL_EXPORT
|
||||
dnl # check symbol exported or not
|
||||
dnl #
|
||||
AC_DEFUN([ZFS_CHECK_SYMBOL_EXPORT],
|
||||
[AC_MSG_CHECKING([whether symbol $1 is exported])
|
||||
grep -q -E '[[[:space:]]]$1[[[:space:]]]' $LINUX/Module.symvers 2>/dev/null
|
||||
rc=$?
|
||||
if test $rc -ne 0; then
|
||||
export=0
|
||||
for file in $2; do
|
||||
grep -q -E "EXPORT_SYMBOL.*($1)" "$LINUX/$file" 2>/dev/null
|
||||
rc=$?
|
||||
if test $rc -eq 0; then
|
||||
export=1
|
||||
break;
|
||||
fi
|
||||
done
|
||||
if test $export -eq 0; then
|
||||
AC_MSG_RESULT([no])
|
||||
$4
|
||||
else
|
||||
AC_MSG_RESULT([yes])
|
||||
$3
|
||||
fi
|
||||
else
|
||||
AC_MSG_RESULT([yes])
|
||||
$3
|
||||
fi
|
||||
])
|
||||
|
||||
dnl #
|
||||
dnl # 2.6.x API change
|
||||
dnl # bio_end_io_t uses 2 args (size was dropped from prototype)
|
||||
dnl #
|
||||
AC_DEFUN([ZFS_AC_2ARGS_BIO_END_IO_T],
|
||||
[AC_MSG_CHECKING([whether bio_end_io_t wants 2 args])
|
||||
tmp_flags="$EXTRA_KCFLAGS"
|
||||
EXTRA_KCFLAGS="-Werror"
|
||||
ZFS_LINUX_TRY_COMPILE([
|
||||
#include <linux/bio.h>
|
||||
],[
|
||||
void (*wanted_end_io)(struct bio *, int) = NULL;
|
||||
bio_end_io_t *local_end_io;
|
||||
|
||||
local_end_io = wanted_end_io;
|
||||
],[
|
||||
AC_MSG_RESULT(yes)
|
||||
AC_DEFINE(HAVE_2ARGS_BIO_END_IO_T, 1,
|
||||
[bio_end_io_t wants 2 args])
|
||||
],[
|
||||
AC_MSG_RESULT(no)
|
||||
])
|
||||
EXTRA_KCFLAGS="$tmp_flags"
|
||||
])
|
|
@ -0,0 +1,10 @@
|
|||
#!/bin/sh
|
||||
|
||||
find . -type d -name .deps | xargs rm -rf
|
||||
rm -rf config.guess config.sub ltmain.sh
|
||||
libtoolize --automake
|
||||
aclocal -I autoconf 2>/dev/null &&
|
||||
autoheader &&
|
||||
automake --add-missing --include-deps # 2>/dev/null &&
|
||||
autoconf
|
||||
|
|
@ -0,0 +1 @@
|
|||
obj-m := conftest.o
|
|
@ -0,0 +1 @@
|
|||
EXTRA_DIST=kernel user lustre
|
|
@ -0,0 +1,11 @@
|
|||
# Default ZFS kernel mode configuration
|
||||
|
||||
UNAME=`uname -r | cut -d- -f1`
|
||||
|
||||
CONFIG=user
|
||||
|
||||
NAME=zfs
|
||||
BRANCH=`awk '/[Bb]ranch:/ {print $$2}' META`
|
||||
VERSION=`awk '/[Vv]ersion:/ {print $$2}' META`
|
||||
RELEASE=`awk '/[Rr]elease:/ {print $$2}' META`
|
||||
BUILDDIR=$NAME+$CONFIG
|
|
@ -0,0 +1,11 @@
|
|||
# Default ZFS lustre mode configuration
|
||||
|
||||
UNAME=`uname -r | cut -d- -f1`
|
||||
|
||||
CONFIG=lustre
|
||||
|
||||
NAME=zfs
|
||||
BRANCH=`awk '/[Bb]ranch:/ {print $$2}' META`
|
||||
VERSION=`awk '/[Vv]ersion:/ {print $$2}' META`
|
||||
RELEASE=`awk '/[Rr]elease:/ {print $$2}' META`
|
||||
BUILDDIR=$NAME+$CONFIG
|
|
@ -0,0 +1,9 @@
|
|||
# Default ZFS user mode configuration
|
||||
|
||||
CONFIG=user
|
||||
|
||||
NAME=zfs
|
||||
BRANCH=`awk '/[Bb]ranch:/ {print $$2}' META`
|
||||
VERSION=`awk '/[Vv]ersion:/ {print $$2}' META`
|
||||
RELEASE=`awk '/[Rr]elease:/ {print $$2}' META`
|
||||
BUILDDIR=$NAME+$CONFIG
|
|
@ -0,0 +1,223 @@
|
|||
#
|
||||
# This file is part of the ZFS Linux port.
|
||||
#
|
||||
# Copyright (c) 2008 Lawrence Livermore National Security, LLC.
|
||||
# Produced at Lawrence Livermore National Laboratory
|
||||
# Written by:
|
||||
# Brian Behlendorf <behlendorf1@llnl.gov>,
|
||||
# Herb Wartens <wartens2@llnl.gov>,
|
||||
# Jim Garlick <garlick@llnl.gov>
|
||||
# LLNL-CODE-403049
|
||||
#
|
||||
# CDDL HEADER START
|
||||
#
|
||||
# The contents of this file are subject to the terms of the
|
||||
# Common Development and Distribution License, Version 1.0 only
|
||||
# (the "License"). You may not use this file except in compliance
|
||||
# with the License.
|
||||
#
|
||||
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
# or http://www.opensolaris.org/os/licensing.
|
||||
# See the License for the specific language governing permissions
|
||||
# and limitations under the License.
|
||||
#
|
||||
# When distributing Covered Code, include this CDDL HEADER in each
|
||||
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
# If applicable, add the following below this CDDL HEADER, with the
|
||||
# fields enclosed by brackets "[]" replaced with your own identifying
|
||||
# information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
#
|
||||
# CDDL HEADER END
|
||||
#
|
||||
|
||||
AC_INIT
|
||||
AC_LANG(C)
|
||||
|
||||
AC_CANONICAL_SYSTEM
|
||||
AM_INIT_AUTOMAKE(zfs, 0.4.0)
|
||||
AC_CONFIG_HEADERS([zfs_config.h])
|
||||
|
||||
AC_PROG_INSTALL
|
||||
AC_PROG_CC
|
||||
AC_PROG_LIBTOOL
|
||||
|
||||
zfsconfig=kernel
|
||||
kernelsrc=
|
||||
kernelbuild=
|
||||
splsrc=
|
||||
splbuild=
|
||||
|
||||
ZFS_AC_CONFIG
|
||||
ZFS_AC_KERNEL
|
||||
ZFS_AC_SPL
|
||||
ZFS_AC_SCRIPT_CONFIG
|
||||
ZFS_AC_LICENSE
|
||||
ZFS_AC_DEBUG
|
||||
ZFS_AC_2ARGS_BIO_END_IO_T
|
||||
|
||||
AC_SUBST(UNAME)
|
||||
AC_SUBST(CONFIG)
|
||||
AC_SUBST(NAME)
|
||||
AC_SUBST(SVNURL)
|
||||
AC_SUBST(BRANCH)
|
||||
AC_SUBST(VERSION)
|
||||
AC_SUBST(RELEASE)
|
||||
AC_SUBST(BRANCHURL)
|
||||
AC_SUBST(TAGURL)
|
||||
AC_SUBST(BUILDURL)
|
||||
AC_SUBST(BUILDDIR)
|
||||
|
||||
# Check for needed userspace bits
|
||||
AC_CHECK_HEADERS(sys/types.h sys/byteorder.h sys/isa_defs.h \
|
||||
sys/systeminfo.h sys/u8_textprep.h libdiskmgt.h)
|
||||
|
||||
AC_CHECK_FUNCS(strlcat strlcpy strnlen issetugid setmntent getexecname)
|
||||
|
||||
AC_CHECK_LIB([diskmgt], [libdiskmgt_error],
|
||||
[AC_DEFINE([HAVE_LIBDISKMGT], 1,
|
||||
[Define to 1 if 'libdiskmgt' library available])])
|
||||
|
||||
AC_CHECK_LIB([efi], [efi_alloc_and_init],
|
||||
[AC_DEFINE([HAVE_LIBEFI], 1,
|
||||
[Define to 1 if 'libefi' library available])])
|
||||
|
||||
AC_CHECK_LIB([share], [sa_init],
|
||||
[AC_DEFINE([HAVE_LIBSHARE], 1,
|
||||
[Define to 1 if 'libshare' library available])])
|
||||
|
||||
AC_EGREP_HEADER(ioctl, unistd.h,
|
||||
[AC_DEFINE([HAVE_IOCTL_IN_UNISTD_H], 1,
|
||||
[Define to 1 if ioctl() is defined in <unistd.h> header file])])
|
||||
|
||||
AC_EGREP_HEADER(ioctl, sys/ioctl.h,
|
||||
[AC_DEFINE([HAVE_IOCTL_IN_SYS_IOCTL_H], 1,
|
||||
[Define to 1 if ioctl() is defined in <sys/ioctl.h> header file])])
|
||||
|
||||
AC_EGREP_HEADER(ioctl, stropts.h,
|
||||
[AC_DEFINE([HAVE_IOCTL_IN_STROPTS_H], 1,
|
||||
[Define to 1 if ioctl() is defined in <stropts.h> header file])])
|
||||
|
||||
AC_EGREP_HEADER(strcmp, strings.h,
|
||||
[AC_DEFINE([HAVE_STRCMP_IN_STRINGS_H], 1,
|
||||
[Define to 1 if strcmpl() is defined in <strings.h> header file])])
|
||||
|
||||
AC_EGREP_HEADER(sysinfo, sys/systeminfo.h,
|
||||
[AC_DEFINE([HAVE_SYSINFO_IN_SYS_SYSTEMINFO_H], 1,
|
||||
[Define to 1 if sysinfo() is defined in <sys/systeminfo.h> header file])])
|
||||
|
||||
#AC_DEFINE([HAVE_ZVOL], 1, ["Define to 1 to include ZVOL support"])
|
||||
#AC_DEFINE([HAVE_ZPL], 1, ["Define to 1 to include ZPL support"])
|
||||
#AC_DEFINE([WANT_FAKE_IOCTL], 1, ["Define to 1 to use fake ioctl() support"])
|
||||
#AC_DEFINE([HAVE_DM_INUSE_SWAP], 1, ["None"])
|
||||
#AC_DEFINE([HAVE_UNICODE], 1, ["None"])
|
||||
#AC_DEFINE([HAVE_INTTYPES], 1, [Define to 1 if unint16 defined in <sys/types.h> header file])
|
||||
|
||||
# Add "V=1" to KERNELMAKE_PARAMS to enable verbose module build
|
||||
KERNELMAKE_PARAMS=
|
||||
KERNELCPPFLAGS="$KERNELCPPFLAGS -DHAVE_SPL -D_KERNEL -I$splsrc -I$splsrc/include -I$TOPDIR"
|
||||
|
||||
# Minimally required for pread() functionality an other GNU goodness
|
||||
HOSTCFLAGS="$HOSTCFLAGS -ggdb -O2 -std=c99 -D_GNU_SOURCE -D__EXTENSIONS__ "
|
||||
# Quiet warnings not covered by the gcc-* patches
|
||||
HOSTCFLAGS="$HOSTCFLAGS -Wno-switch -Wno-unused -Wno-missing-braces -Wno-parentheses "
|
||||
HOSTCFLAGS="$HOSTCFLAGS -Wno-uninitialized -fno-strict-aliasing "
|
||||
# Expected defines not covered by zfs_config.h
|
||||
HOSTCFLAGS="$HOSTCFLAGS -DHAVE_SPL -D_POSIX_PTHREAD_SEMANTICS "
|
||||
HOSTCFLAGS="$HOSTCFLAGS -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_REENTRANT "
|
||||
HOSTCFLAGS="$HOSTCFLAGS -DTEXT_DOMAIN=\\\"zfs-linux-kernel\\\" "
|
||||
# Expected default include paths additional paths added by Makefiles
|
||||
HOSTCFLAGS="$HOSTCFLAGS -I$TOPDIR "
|
||||
|
||||
if test "$kernelbuild" != "$kernelsrc"; then
|
||||
KERNELMAKE_PARAMS="$KERNELMAKE_PARAMS O=$kernelbuild"
|
||||
fi
|
||||
|
||||
AC_SUBST(KERNELMAKE_PARAMS)
|
||||
AC_SUBST(KERNELCPPFLAGS)
|
||||
AC_SUBST(HOSTCFLAGS)
|
||||
|
||||
AC_CONFIG_FILES([ Makefile
|
||||
autoconf/Makefile
|
||||
configs/Makefile
|
||||
doc/Makefile
|
||||
scripts/Makefile
|
||||
zfs/Makefile
|
||||
zfs/lib/libudmu/include/Makefile
|
||||
zfs/lib/libudmu/Makefile
|
||||
zfs/lib/Makefile
|
||||
zfs/lib/libnvpair/include/sys/Makefile
|
||||
zfs/lib/libnvpair/include/Makefile
|
||||
zfs/lib/libnvpair/Makefile
|
||||
zfs/lib/libsolcompat/sparc64/Makefile
|
||||
zfs/lib/libsolcompat/Makefile
|
||||
zfs/lib/libsolcompat/include/tsol/Makefile
|
||||
zfs/lib/libsolcompat/include/sparc64/sys/Makefile
|
||||
zfs/lib/libsolcompat/include/sparc64/Makefile
|
||||
zfs/lib/libsolcompat/include/rpc/Makefile
|
||||
zfs/lib/libsolcompat/include/i386/sys/Makefile
|
||||
zfs/lib/libsolcompat/include/i386/Makefile
|
||||
zfs/lib/libsolcompat/include/ia32/sys/Makefile
|
||||
zfs/lib/libsolcompat/include/ia32/Makefile
|
||||
zfs/lib/libsolcompat/include/amd64/sys/Makefile
|
||||
zfs/lib/libsolcompat/include/amd64/Makefile
|
||||
zfs/lib/libsolcompat/include/sys/sysevent/Makefile
|
||||
zfs/lib/libsolcompat/include/sys/fm/Makefile
|
||||
zfs/lib/libsolcompat/include/sys/Makefile
|
||||
zfs/lib/libsolcompat/include/Makefile
|
||||
zfs/lib/libsolcompat/i386/Makefile
|
||||
zfs/lib/libsolcompat/amd64/Makefile
|
||||
zfs/lib/libavl/include/sys/Makefile
|
||||
zfs/lib/libavl/include/Makefile
|
||||
zfs/lib/libavl/Makefile
|
||||
zfs/lib/libuutil/include/Makefile
|
||||
zfs/lib/libuutil/Makefile
|
||||
zfs/lib/libzfs/include/Makefile
|
||||
zfs/lib/libzfs/Makefile
|
||||
zfs/lib/libumem/include/Makefile
|
||||
zfs/lib/libumem/Makefile
|
||||
zfs/lib/libumem/sys/Makefile
|
||||
zfs/lib/libzcommon/include/Makefile
|
||||
zfs/lib/libzcommon/include/sys/fm/fs/Makefile
|
||||
zfs/lib/libzcommon/include/sys/fm/Makefile
|
||||
zfs/lib/libzcommon/include/sys/Makefile
|
||||
zfs/lib/libzcommon/include/sys/fs/Makefile
|
||||
zfs/lib/libzcommon/Makefile
|
||||
zfs/lib/libzpool/Makefile
|
||||
zfs/lib/libport/include/sys/Makefile
|
||||
zfs/lib/libport/include/Makefile
|
||||
zfs/lib/libport/Makefile
|
||||
zfs/lib/libdmu-ctl/include/sys/Makefile
|
||||
zfs/lib/libdmu-ctl/include/Makefile
|
||||
zfs/lib/libdmu-ctl/Makefile
|
||||
zfs/zcmd/ztest/Makefile
|
||||
zfs/zcmd/Makefile
|
||||
zfs/zcmd/zfs/Makefile
|
||||
zfs/zcmd/zdb/Makefile
|
||||
zfs/zcmd/zinject/Makefile
|
||||
zfs/zcmd/zdump/Makefile
|
||||
zfs/zcmd/zpool/Makefile
|
||||
])
|
||||
AC_OUTPUT
|
||||
|
||||
# HACK: I really, really hate this... but to ensure the kernel build
|
||||
# system compiles C files shared between a library and a kernel module,
|
||||
# we need to ensure each file has a unique make target. To do that
|
||||
# I'm creating symlinks for each shared file at configure time. It
|
||||
# may be possible something better can be done in the Makefile but it
|
||||
# will take some serious investigation and I don't have the time now.
|
||||
|
||||
echo
|
||||
echo "Creating symlinks for additional make targets"
|
||||
ln -s $LIBDIR/libport/u8_textprep.c $LIBDIR/libport/ku8_textprep.c
|
||||
ln -s $LIBDIR/libavl/avl.c $LIBDIR/libavl/kavl.c
|
||||
ln -s $LIBDIR/libavl/avl.c $LIBDIR/libavl/uavl.c
|
||||
ln -s $LIBDIR/libnvpair/nvpair.c $LIBDIR/libnvpair/knvpair.c
|
||||
ln -s $LIBDIR/libnvpair/nvpair.c $LIBDIR/libnvpair/unvpair.c
|
||||
ln -s $LIBDIR/libzcommon/zfs_deleg.c $LIBDIR/libzcommon/kzfs_deleg.c
|
||||
ln -s $LIBDIR/libzcommon/zfs_prop.c $LIBDIR/libzcommon/kzfs_prop.c
|
||||
ln -s $LIBDIR/libzcommon/zprop_common.c $LIBDIR/libzcommon/kzprop_common.c
|
||||
ln -s $LIBDIR/libzcommon/compress.c $LIBDIR/libzcommon/kcompress.c
|
||||
ln -s $LIBDIR/libzcommon/list.c $LIBDIR/libzcommon/klist.c
|
||||
ln -s $LIBDIR/libzcommon/zfs_namecheck.c $LIBDIR/libzcommon/kzfs_namecheck.c
|
||||
ln -s $LIBDIR/libzcommon/zfs_comutil.c $LIBDIR/libzcommon/kzfs_comutil.c
|
||||
ln -s $LIBDIR/libzcommon/zpool_prop.c $LIBDIR/libzcommon/kzpool_prop.c
|
|
@ -0,0 +1,113 @@
|
|||
From: Chris Dunlap <cdunlap@llnl.gov>
|
||||
To: tak1@llnl.gov (James Tak)
|
||||
Cc: rogers11@llnl.gov (Leah Rogers), garlick@llnl.gov (Jim Garlick),
|
||||
mgary@llnl.gov (Mark Gary), kimcupps@llnl.gov (Kim Cupps)
|
||||
Date: Mon, 26 Mar 2007 15:37:07 -0700
|
||||
Subject: CDDL/GPL licensing issues for ZFS Linux port
|
||||
|
||||
James,
|
||||
|
||||
We want to port Sun's Zettabyte File System (ZFS) to Linux and
|
||||
ultimately redistribute the source code of our work. We've been
|
||||
talking with Leah about this and have a meeting scheduled with you
|
||||
for this coming Thursday at 2pm. I just wanted to give you a summary
|
||||
before the meeting of what we're proposing.
|
||||
|
||||
ZFS is part of OpenSolaris which is licensed under the Common
|
||||
Development and Distribution License (CDDL):
|
||||
|
||||
http://www.opensolaris.org/os/licensing/cddllicense.txt
|
||||
|
||||
The Linux kernel is licensed under the GNU General Public License (GPL)
|
||||
(specifically, under version 2 of the license only):
|
||||
|
||||
http://www.fsf.org/licensing/licenses/gpl.html
|
||||
|
||||
While these are both Open-Source licenses, the Free Software Foundation
|
||||
(FSF) states they are incompatible with one another:
|
||||
|
||||
http://www.fsf.org/licensing/licenses/index_html
|
||||
|
||||
"[CDDL] is a free software license which is not a strong copyleft;
|
||||
it has some complex restrictions that make it incompatible with the
|
||||
GNU GPL. It requires that all attribution notices be maintained,
|
||||
while the GPL only requires certain types of notices. Also, it
|
||||
terminates in retaliation for certain aggressive uses of patents.
|
||||
So, a module covered by the GPL and a module covered by the CDDL
|
||||
cannot legally be linked together."
|
||||
|
||||
As an aside, Sun is reportedly considering releasing OpenSolaris under
|
||||
GPL3 (i.e., the upcoming version 3 of the GNU General Public License):
|
||||
|
||||
http://blogs.sun.com/jonathan/entry/hp_and_sun_partnering_around
|
||||
|
||||
http://arstechnica.com/news.ars/post/20060130-6074.html
|
||||
|
||||
http://news.com.com/Sun+considers+GPL+3+license+for+Solaris/2100-1016_3-6032893.html
|
||||
|
||||
Since the GPL3 has not been finalized, it is unclear whether
|
||||
incompatibilities will exist between GPL2 and GPL3.
|
||||
|
||||
Linus Torvalds (the original creator of Linux) describes his views
|
||||
on the licensing of Linux kernel modules in the following email thread:
|
||||
|
||||
http://linuxmafia.com/faq/Kernel/proprietary-kernel-modules.html
|
||||
|
||||
Most of this thread is in regards to proprietary closed-source
|
||||
binary-only modules for Linux. Linus generally considers modules
|
||||
written for Linux using the kernel infrastructures to be derived
|
||||
works of Linux, even if they don't copy any existing Linux code.
|
||||
However, he specifically singles out drivers and filesystems ported
|
||||
from other operating systems as not being derived works:
|
||||
|
||||
"It would be rather preposterous to call the Andrew FileSystem a
|
||||
'derived work' of Linux, for example, so I think it's perfectly
|
||||
OK to have a AFS module, for example."
|
||||
|
||||
"The original binary-only modules were for things that were
|
||||
pre-existing works of code, i.e., drivers and filesystems ported
|
||||
from other operating systems, which thus could clearly be argued
|
||||
to not be derived works..."
|
||||
|
||||
Based on this, it seems our port of Sun's ZFS filesystem to Linux
|
||||
would not be considered a derived work of Linux, and therefore not
|
||||
covered by the GPL. The issue of the CDDL/GPL license incompatibility
|
||||
becomes moot. As such, we should be able to redistribute our changes
|
||||
to ZFS in source-code form licensed under the CDDL since this will
|
||||
be a derived work of the original ZFS code. There seems to be some
|
||||
dissent as to whether a binary module could be redistributed as well,
|
||||
but that issue does not concern us. In this instance, we are only
|
||||
interested in redistribution of our work in source-code form.
|
||||
|
||||
-Chris
|
||||
|
||||
To: Chris Dunlap <cdunlap@llnl.gov>
|
||||
From: James Tak <tak1@llnl.gov>
|
||||
Subject: Re: CDDL/GPL licensing issues for ZFS Linux port
|
||||
Cc: rogers11@llnl.gov (Leah Rogers), garlick@llnl.gov (Jim Garlick),
|
||||
mgary@llnl.gov (Mark Gary), kimcupps@llnl.gov (Kim Cupps)
|
||||
Date: Thu, 29 Mar 2007 14:53:01 -0700
|
||||
|
||||
Hi Chris,
|
||||
As per our discussion today, the ZFS port you are proposing releasing under
|
||||
the CDDL license should be o.k. since it is a derivative work of the
|
||||
original ZFS module (under CDDL) and is therefore also subject to CDDL
|
||||
under the distribution terms of that license. While the issue of linking
|
||||
has been greatly debated in the OS community, I think it is fair to say in
|
||||
this instance the ZFS port is not a derivative work of Linux and thus not
|
||||
subject to the GPL. Furthermore, it shouldn't be a problem especially
|
||||
since even Linus Torvald has expressed that modules such as yours are not
|
||||
derived works of Linux.
|
||||
|
||||
Let me know if you have any further questions at x27274. Thanks.
|
||||
|
||||
Regards,
|
||||
James
|
||||
|
||||
James S. Tak
|
||||
Assistant Laboratory Counsel for Intellectual Property
|
||||
Office of Laboratory Counsel
|
||||
Lawrence Livermore National Laboratory
|
||||
phone: (925) 422-7274
|
||||
fax: (925) 423-2231
|
||||
tak1@llnl.gov
|
|
@ -0,0 +1 @@
|
|||
EXTRA_DIST = LEGAL
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
|
@ -0,0 +1,4 @@
|
|||
int main(void)
|
||||
{
|
||||
return 0;
|
||||
}
|
|
@ -0,0 +1,4 @@
|
|||
int main(void)
|
||||
{
|
||||
return 0;
|
||||
}
|
|
@ -0,0 +1,4 @@
|
|||
int main(void)
|
||||
{
|
||||
return 0;
|
||||
}
|
|
@ -0,0 +1,4 @@
|
|||
int main(void)
|
||||
{
|
||||
return 0;
|
||||
}
|
|
@ -0,0 +1,17 @@
|
|||
#
|
||||
# Mostly patches for a userspace build. For now I'm leaving them all
|
||||
# out until we go through the code base and sort out the userspace
|
||||
# portion of the build system. We may find we do not want or need
|
||||
# many of these patches anymore. -Brian
|
||||
#
|
||||
zap-cursor-move-to-key.patch # Add a ZAP API to move a ZAP cursor to a
|
||||
given key.
|
||||
spa-force-readonly.patch # Add API to discard all writes
|
||||
no-debug-userspace.patch # Disable debug code on userspace
|
||||
no-events.patch # Define away spa_event_notify() in userspace
|
||||
pthreads.patch # Use POSIX threads in userspace.
|
||||
port-no-zmod.patch # Do not use zmod.h in userspace.
|
||||
port-pragma-init.patch # Use constructor attribute on non-Solaris
|
||||
platforms.
|
||||
lztest-lzdb.patch # Make lztest call lzdb from PATH.
|
||||
zpool-force.patch # Change -f to -F in zpool command
|
|
@ -0,0 +1,40 @@
|
|||
Make lztest call lzdb from PATH.
|
||||
|
||||
Index: zfs+chaos4/cmd/lztest/ztest.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/cmd/lztest/ztest.c
|
||||
+++ zfs+chaos4/cmd/lztest/ztest.c
|
||||
@@ -3043,30 +3043,17 @@ ztest_verify_blocks(char *pool)
|
||||
char zbuf[1024];
|
||||
char *bin;
|
||||
char *ztest;
|
||||
- char *isa;
|
||||
- int isalen;
|
||||
FILE *fp;
|
||||
|
||||
- (void) realpath(getexecname(), zdb);
|
||||
-
|
||||
- /* zdb lives in /usr/sbin, while ztest lives in /usr/bin */
|
||||
- bin = strstr(zdb, "/usr/bin/");
|
||||
- ztest = strstr(bin, "/ztest");
|
||||
- isa = bin + 8;
|
||||
- isalen = ztest - isa;
|
||||
- isa = strdup(isa);
|
||||
/* LINTED */
|
||||
- (void) sprintf(bin,
|
||||
- "/usr/sbin%.*s/zdb -bc%s%s -U /tmp/zpool.cache -O %s %s",
|
||||
- isalen,
|
||||
- isa,
|
||||
+ (void) sprintf(zdb,
|
||||
+ "lzdb -bc%s%s -U /tmp/zpool.cache -O %s %s",
|
||||
zopt_verbose >= 3 ? "s" : "",
|
||||
zopt_verbose >= 4 ? "v" : "",
|
||||
ztest_random(2) == 0 ? "pre" : "post", pool);
|
||||
- free(isa);
|
||||
|
||||
if (zopt_verbose >= 5)
|
||||
- (void) printf("Executing %s\n", strstr(zdb, "zdb "));
|
||||
+ (void) printf("Executing %s\n", strstr(zdb, "lzdb "));
|
||||
|
||||
fp = popen(zdb, "r");
|
||||
|
|
@ -0,0 +1,184 @@
|
|||
Disable debug code on userspace
|
||||
|
||||
Index: zfs+chaos4/lib/libzfscommon/include/sys/arc.h
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfscommon/include/sys/arc.h
|
||||
+++ zfs+chaos4/lib/libzfscommon/include/sys/arc.h
|
||||
@@ -82,7 +82,7 @@ int arc_released(arc_buf_t *buf);
|
||||
int arc_has_callback(arc_buf_t *buf);
|
||||
void arc_buf_freeze(arc_buf_t *buf);
|
||||
void arc_buf_thaw(arc_buf_t *buf);
|
||||
-#ifdef ZFS_DEBUG
|
||||
+#if defined(ZFS_DEBUG) || (!defined(_KERNEL) && !defined(NDEBUG))
|
||||
int arc_referenced(arc_buf_t *buf);
|
||||
#endif
|
||||
|
||||
Index: zfs+chaos4/lib/libzfscommon/include/sys/refcount.h
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfscommon/include/sys/refcount.h
|
||||
+++ zfs+chaos4/lib/libzfscommon/include/sys/refcount.h
|
||||
@@ -43,7 +43,7 @@ extern "C" {
|
||||
*/
|
||||
#define FTAG ((char *)__func__)
|
||||
|
||||
-#if defined(DEBUG) || !defined(_KERNEL)
|
||||
+#if defined(DEBUG)
|
||||
typedef struct reference {
|
||||
list_node_t ref_link;
|
||||
void *ref_holder;
|
||||
Index: zfs+chaos4/lib/libzfscommon/include/sys/zfs_context_user.h
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfscommon/include/sys/zfs_context_user.h
|
||||
+++ zfs+chaos4/lib/libzfscommon/include/sys/zfs_context_user.h
|
||||
@@ -96,6 +96,8 @@ extern "C" {
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
extern void dprintf_setup(int *argc, char **argv);
|
||||
+#else
|
||||
+#define dprintf_setup(ac,av) ((void) 0)
|
||||
#endif /* ZFS_DEBUG */
|
||||
|
||||
extern void cmn_err(int, const char *, ...);
|
||||
@@ -105,21 +107,26 @@ extern void vpanic(const char *, __va_li
|
||||
|
||||
#define fm_panic panic
|
||||
|
||||
+#ifndef zp_verify
|
||||
/* This definition is copied from assert.h. */
|
||||
#if defined(__STDC__)
|
||||
#if __STDC_VERSION__ - 0 >= 199901L
|
||||
-#define verify(EX) (void)((EX) || \
|
||||
+#define zp_verify(EX) (void)((EX) || \
|
||||
(__assert_c99(#EX, __FILE__, __LINE__, __func__), 0))
|
||||
#else
|
||||
-#define verify(EX) (void)((EX) || (__assert(#EX, __FILE__, __LINE__), 0))
|
||||
+#define zp_verify(EX) (void)((EX) || (__assert(#EX, __FILE__, __LINE__), 0))
|
||||
#endif /* __STDC_VERSION__ - 0 >= 199901L */
|
||||
#else
|
||||
-#define verify(EX) (void)((EX) || (_assert("EX", __FILE__, __LINE__), 0))
|
||||
+#define zp_verify(EX) (void)((EX) || (_assert("EX", __FILE__, __LINE__), 0))
|
||||
#endif /* __STDC__ */
|
||||
+#endif
|
||||
|
||||
-
|
||||
-#define VERIFY verify
|
||||
+#ifndef VERIFY
|
||||
+#define VERIFY zp_verify
|
||||
+#endif
|
||||
+#ifndef ASSERT
|
||||
#define ASSERT assert
|
||||
+#endif
|
||||
|
||||
extern void __assert(const char *, const char *, int);
|
||||
|
||||
@@ -332,6 +339,7 @@ extern int taskq_member(taskq_t *, void
|
||||
typedef struct vnode {
|
||||
uint64_t v_size;
|
||||
int v_fd;
|
||||
+ mode_t v_mode;
|
||||
char *v_path;
|
||||
} vnode_t;
|
||||
|
||||
Index: zfs+chaos4/lib/libzfscommon/include/sys/zfs_debug.h
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfscommon/include/sys/zfs_debug.h
|
||||
+++ zfs+chaos4/lib/libzfscommon/include/sys/zfs_debug.h
|
||||
@@ -44,7 +44,7 @@ extern "C" {
|
||||
* ZFS debugging
|
||||
*/
|
||||
|
||||
-#if defined(DEBUG) || !defined(_KERNEL)
|
||||
+#if defined(DEBUG)
|
||||
#define ZFS_DEBUG
|
||||
#endif
|
||||
|
||||
Index: zfs+chaos4/lib/libzpool/arc.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/arc.c
|
||||
+++ zfs+chaos4/lib/libzpool/arc.c
|
||||
@@ -1802,7 +1802,7 @@ arc_reclaim_needed(void)
|
||||
return (1);
|
||||
#endif
|
||||
|
||||
-#else
|
||||
+#elif defined(ZFS_DEBUG)
|
||||
if (spa_get_random(100) == 0)
|
||||
return (1);
|
||||
#endif
|
||||
@@ -2881,7 +2881,7 @@ arc_has_callback(arc_buf_t *buf)
|
||||
return (buf->b_efunc != NULL);
|
||||
}
|
||||
|
||||
-#ifdef ZFS_DEBUG
|
||||
+#if defined(ZFS_DEBUG) || (!defined(_KERNEL) && !defined(NDEBUG))
|
||||
int
|
||||
arc_referenced(arc_buf_t *buf)
|
||||
{
|
||||
Index: zfs+chaos4/lib/libzpool/kernel.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/kernel.c
|
||||
+++ zfs+chaos4/lib/libzpool/kernel.c
|
||||
@@ -384,6 +384,7 @@ vn_open(char *path, int x1, int flags, i
|
||||
|
||||
vp->v_fd = fd;
|
||||
vp->v_size = st.st_size;
|
||||
+ vp->v_mode = st.st_mode;
|
||||
vp->v_path = spa_strdup(path);
|
||||
|
||||
return (0);
|
||||
@@ -422,10 +423,17 @@ vn_rdwr(int uio, vnode_t *vp, void *addr
|
||||
* To simulate partial disk writes, we split writes into two
|
||||
* system calls so that the process can be killed in between.
|
||||
*/
|
||||
- split = (len > 0 ? rand() % len : 0);
|
||||
- iolen = pwrite64(vp->v_fd, addr, split, offset);
|
||||
- iolen += pwrite64(vp->v_fd, (char *)addr + split,
|
||||
- len - split, offset + split);
|
||||
+#ifdef ZFS_DEBUG
|
||||
+ if (!S_ISBLK(vp->v_mode) && !S_ISCHR(vp->v_mode)) {
|
||||
+ split = (len > 0 ? rand() % len : 0);
|
||||
+ iolen = pwrite64(vp->v_fd, addr, split, offset);
|
||||
+ iolen += pwrite64(vp->v_fd, (char *)addr + split,
|
||||
+ len - split, offset + split);
|
||||
+ } else
|
||||
+ iolen = pwrite64(vp->v_fd, addr, len, offset);
|
||||
+#else
|
||||
+ iolen = pwrite64(vp->v_fd, addr, len, offset);
|
||||
+#endif
|
||||
}
|
||||
|
||||
if (iolen < 0)
|
||||
Index: zfs+chaos4/lib/libzpool/refcount.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/refcount.c
|
||||
+++ zfs+chaos4/lib/libzpool/refcount.c
|
||||
@@ -28,7 +28,7 @@
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/refcount.h>
|
||||
|
||||
-#if defined(DEBUG) || !defined(_KERNEL)
|
||||
+#if defined(DEBUG)
|
||||
|
||||
#ifdef _KERNEL
|
||||
int reference_tracking_enable = FALSE; /* runs out of memory too easily */
|
||||
Index: zfs+chaos4/lib/libzpool/spa_misc.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/spa_misc.c
|
||||
+++ zfs+chaos4/lib/libzpool/spa_misc.c
|
||||
@@ -178,11 +178,15 @@ kmem_cache_t *spa_buffer_pool;
|
||||
int spa_mode;
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
+#ifdef _KERNEL
|
||||
/* Everything except dprintf is on by default in debug builds */
|
||||
int zfs_flags = ~ZFS_DEBUG_DPRINTF;
|
||||
#else
|
||||
+int zfs_flags = ~0;
|
||||
+#endif /* _KERNEL */
|
||||
+#else
|
||||
int zfs_flags = 0;
|
||||
-#endif
|
||||
+#endif /* ZFS_DEBUG */
|
||||
|
||||
/*
|
||||
* zfs_recover can be set to nonzero to attempt to recover from
|
|
@ -0,0 +1,44 @@
|
|||
Define away spa_event_notify() in userspace - not necessary and breaks compilation in older Solaris builds.
|
||||
|
||||
Index: zfs+chaos4/lib/libzfscommon/include/sys/spa.h
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfscommon/include/sys/spa.h
|
||||
+++ zfs+chaos4/lib/libzfscommon/include/sys/spa.h
|
||||
@@ -516,7 +516,11 @@ extern int spa_prop_get(spa_t *spa, nvli
|
||||
extern void spa_prop_clear_bootfs(spa_t *spa, uint64_t obj, dmu_tx_t *tx);
|
||||
|
||||
/* asynchronous event notification */
|
||||
+#ifdef _KERNEL
|
||||
extern void spa_event_notify(spa_t *spa, vdev_t *vdev, const char *name);
|
||||
+#else
|
||||
+#define spa_event_notify(s,v,n) ((void) 0)
|
||||
+#endif
|
||||
|
||||
#ifdef ZFS_DEBUG
|
||||
#define dprintf_bp(bp, fmt, ...) do { \
|
||||
Index: zfs+chaos4/lib/libzpool/spa.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/spa.c
|
||||
+++ zfs+chaos4/lib/libzpool/spa.c
|
||||
@@ -4449,10 +4449,10 @@ spa_has_spare(spa_t *spa, uint64_t guid)
|
||||
* in the userland libzpool, as we don't want consumers to misinterpret ztest
|
||||
* or zdb as real changes.
|
||||
*/
|
||||
+#ifdef _KERNEL
|
||||
void
|
||||
spa_event_notify(spa_t *spa, vdev_t *vd, const char *name)
|
||||
{
|
||||
-#ifdef _KERNEL
|
||||
sysevent_t *ev;
|
||||
sysevent_attr_list_t *attr = NULL;
|
||||
sysevent_value_t value;
|
||||
@@ -4497,8 +4497,8 @@ done:
|
||||
if (attr)
|
||||
sysevent_free_attr(attr);
|
||||
sysevent_free(ev);
|
||||
-#endif
|
||||
}
|
||||
+#endif
|
||||
|
||||
void
|
||||
spa_discard_io(spa_t *spa)
|
|
@ -0,0 +1,112 @@
|
|||
Do not use zmod.h in userspace.
|
||||
|
||||
Index: zfs+chaos4/lib/libzpool/gzip.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/gzip.c
|
||||
+++ zfs+chaos4/lib/libzpool/gzip.c
|
||||
@@ -28,22 +28,35 @@
|
||||
|
||||
#include <sys/debug.h>
|
||||
#include <sys/types.h>
|
||||
-#include <sys/zmod.h>
|
||||
|
||||
#ifdef _KERNEL
|
||||
+
|
||||
#include <sys/systm.h>
|
||||
-#else
|
||||
+#include <sys/zmod.h>
|
||||
+
|
||||
+typedef size_t zlen_t;
|
||||
+#define compress_func z_compress_level
|
||||
+#define uncompress_func z_uncompress
|
||||
+
|
||||
+#else /* _KERNEL */
|
||||
+
|
||||
#include <strings.h>
|
||||
+#include <zlib.h>
|
||||
+
|
||||
+typedef uLongf zlen_t;
|
||||
+#define compress_func compress2
|
||||
+#define uncompress_func uncompress
|
||||
+
|
||||
#endif
|
||||
|
||||
size_t
|
||||
gzip_compress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n)
|
||||
{
|
||||
- size_t dstlen = d_len;
|
||||
+ zlen_t dstlen = d_len;
|
||||
|
||||
ASSERT(d_len <= s_len);
|
||||
|
||||
- if (z_compress_level(d_start, &dstlen, s_start, s_len, n) != Z_OK) {
|
||||
+ if (compress_func(d_start, &dstlen, s_start, s_len, n) != Z_OK) {
|
||||
if (d_len != s_len)
|
||||
return (s_len);
|
||||
|
||||
@@ -51,18 +64,18 @@ gzip_compress(void *s_start, void *d_sta
|
||||
return (s_len);
|
||||
}
|
||||
|
||||
- return (dstlen);
|
||||
+ return ((size_t) dstlen);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
int
|
||||
gzip_decompress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n)
|
||||
{
|
||||
- size_t dstlen = d_len;
|
||||
+ zlen_t dstlen = d_len;
|
||||
|
||||
ASSERT(d_len >= s_len);
|
||||
|
||||
- if (z_uncompress(d_start, &dstlen, s_start, s_len) != Z_OK)
|
||||
+ if (uncompress_func(d_start, &dstlen, s_start, s_len) != Z_OK)
|
||||
return (-1);
|
||||
|
||||
return (0);
|
||||
Index: zfs+chaos4/lib/libzpool/kernel.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/kernel.c
|
||||
+++ zfs+chaos4/lib/libzpool/kernel.c
|
||||
@@ -36,7 +36,6 @@
|
||||
#include <sys/stat.h>
|
||||
#include <sys/processor.h>
|
||||
#include <sys/zfs_context.h>
|
||||
-#include <sys/zmod.h>
|
||||
#include <sys/utsname.h>
|
||||
#include <sys/time.h>
|
||||
|
||||
@@ -876,31 +875,6 @@ kernel_fini(void)
|
||||
urandom_fd = -1;
|
||||
}
|
||||
|
||||
-int
|
||||
-z_uncompress(void *dst, size_t *dstlen, const void *src, size_t srclen)
|
||||
-{
|
||||
- int ret;
|
||||
- uLongf len = *dstlen;
|
||||
-
|
||||
- if ((ret = uncompress(dst, &len, src, srclen)) == Z_OK)
|
||||
- *dstlen = (size_t)len;
|
||||
-
|
||||
- return (ret);
|
||||
-}
|
||||
-
|
||||
-int
|
||||
-z_compress_level(void *dst, size_t *dstlen, const void *src, size_t srclen,
|
||||
- int level)
|
||||
-{
|
||||
- int ret;
|
||||
- uLongf len = *dstlen;
|
||||
-
|
||||
- if ((ret = compress2(dst, &len, src, srclen, level)) == Z_OK)
|
||||
- *dstlen = (size_t)len;
|
||||
-
|
||||
- return (ret);
|
||||
-}
|
||||
-
|
||||
/*ARGSUSED*/
|
||||
size_t u8_textprep_str(char *i, size_t *il, char *o, size_t *ol, int nf,
|
||||
size_t vers, int *err)
|
|
@ -0,0 +1,52 @@
|
|||
Use constructor attribute on non-Solaris platforms.
|
||||
|
||||
Index: zfs+chaos4/lib/libuutil/uu_misc.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libuutil/uu_misc.c
|
||||
+++ zfs+chaos4/lib/libuutil/uu_misc.c
|
||||
@@ -251,7 +251,13 @@ uu_release_child(void)
|
||||
uu_release();
|
||||
}
|
||||
|
||||
+#ifdef __GNUC__
|
||||
+static void
|
||||
+uu_init(void) __attribute__((constructor));
|
||||
+#else
|
||||
#pragma init(uu_init)
|
||||
+#endif
|
||||
+
|
||||
static void
|
||||
uu_init(void)
|
||||
{
|
||||
Index: zfs+chaos4/lib/libzfs/libzfs_mount.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfs/libzfs_mount.c
|
||||
+++ zfs+chaos4/lib/libzfs/libzfs_mount.c
|
||||
@@ -128,7 +128,13 @@ zfs_share_proto_t share_all_proto[] = {
|
||||
PROTO_END
|
||||
};
|
||||
|
||||
+#ifdef __GNUC__
|
||||
+static void
|
||||
+zfs_iscsi_init(void) __attribute__((constructor));
|
||||
+#else
|
||||
#pragma init(zfs_iscsi_init)
|
||||
+#endif
|
||||
+
|
||||
static void
|
||||
zfs_iscsi_init(void)
|
||||
{
|
||||
@@ -548,8 +554,12 @@ static void (*_sa_update_sharetab_ts)(sa
|
||||
* values to be used later. This is triggered by the runtime loader.
|
||||
* Make sure the correct ISA version is loaded.
|
||||
*/
|
||||
-
|
||||
+#ifdef __GNUC__
|
||||
+static void
|
||||
+_zfs_init_libshare(void) __attribute__((constructor));
|
||||
+#else
|
||||
#pragma init(_zfs_init_libshare)
|
||||
+#endif
|
||||
static void
|
||||
_zfs_init_libshare(void)
|
||||
{
|
|
@ -0,0 +1,924 @@
|
|||
Use POSIX threads in userspace.
|
||||
|
||||
Index: zfs+chaos4/cmd/lztest/ztest.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/cmd/lztest/ztest.c
|
||||
+++ zfs+chaos4/cmd/lztest/ztest.c
|
||||
@@ -141,7 +141,7 @@ typedef struct ztest_args {
|
||||
spa_t *za_spa;
|
||||
objset_t *za_os;
|
||||
zilog_t *za_zilog;
|
||||
- thread_t za_thread;
|
||||
+ pthread_t za_thread;
|
||||
uint64_t za_instance;
|
||||
uint64_t za_random;
|
||||
uint64_t za_diroff;
|
||||
@@ -224,17 +224,17 @@ ztest_info_t ztest_info[] = {
|
||||
* Stuff we need to share writably between parent and child.
|
||||
*/
|
||||
typedef struct ztest_shared {
|
||||
- mutex_t zs_vdev_lock;
|
||||
- rwlock_t zs_name_lock;
|
||||
- uint64_t zs_vdev_primaries;
|
||||
- uint64_t zs_enospc_count;
|
||||
- hrtime_t zs_start_time;
|
||||
- hrtime_t zs_stop_time;
|
||||
- uint64_t zs_alloc;
|
||||
- uint64_t zs_space;
|
||||
- ztest_info_t zs_info[ZTEST_FUNCS];
|
||||
- mutex_t zs_sync_lock[ZTEST_SYNC_LOCKS];
|
||||
- uint64_t zs_seq[ZTEST_SYNC_LOCKS];
|
||||
+ pthread_mutex_t zs_vdev_lock;
|
||||
+ pthread_rwlock_t zs_name_lock;
|
||||
+ uint64_t zs_vdev_primaries;
|
||||
+ uint64_t zs_enospc_count;
|
||||
+ hrtime_t zs_start_time;
|
||||
+ hrtime_t zs_stop_time;
|
||||
+ uint64_t zs_alloc;
|
||||
+ uint64_t zs_space;
|
||||
+ ztest_info_t zs_info[ZTEST_FUNCS];
|
||||
+ pthread_mutex_t zs_sync_lock[ZTEST_SYNC_LOCKS];
|
||||
+ uint64_t zs_seq[ZTEST_SYNC_LOCKS];
|
||||
} ztest_shared_t;
|
||||
|
||||
static char ztest_dev_template[] = "%s/%s.%llua";
|
||||
@@ -818,7 +818,7 @@ ztest_spa_create_destroy(ztest_args_t *z
|
||||
* Attempt to create an existing pool. It shouldn't matter
|
||||
* what's in the nvroot; we should fail with EEXIST.
|
||||
*/
|
||||
- (void) rw_rdlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_rdlock(&ztest_shared->zs_name_lock);
|
||||
nvroot = make_vdev_root(0, 0, 0, 0, 1);
|
||||
error = spa_create(za->za_pool, nvroot, NULL, NULL);
|
||||
nvlist_free(nvroot);
|
||||
@@ -834,7 +834,7 @@ ztest_spa_create_destroy(ztest_args_t *z
|
||||
fatal(0, "spa_destroy() = %d", error);
|
||||
|
||||
spa_close(spa, FTAG);
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -851,7 +851,7 @@ ztest_vdev_add_remove(ztest_args_t *za)
|
||||
if (zopt_verbose >= 6)
|
||||
(void) printf("adding vdev\n");
|
||||
|
||||
- (void) mutex_lock(&ztest_shared->zs_vdev_lock);
|
||||
+ (void) pthread_mutex_lock(&ztest_shared->zs_vdev_lock);
|
||||
|
||||
spa_config_enter(spa, RW_READER, FTAG);
|
||||
|
||||
@@ -869,7 +869,7 @@ ztest_vdev_add_remove(ztest_args_t *za)
|
||||
error = spa_vdev_add(spa, nvroot);
|
||||
nvlist_free(nvroot);
|
||||
|
||||
- (void) mutex_unlock(&ztest_shared->zs_vdev_lock);
|
||||
+ (void) pthread_mutex_unlock(&ztest_shared->zs_vdev_lock);
|
||||
|
||||
if (error == ENOSPC)
|
||||
ztest_record_enospc("spa_vdev_add");
|
||||
@@ -927,7 +927,7 @@ ztest_vdev_attach_detach(ztest_args_t *z
|
||||
int error, expected_error;
|
||||
int fd;
|
||||
|
||||
- (void) mutex_lock(&ztest_shared->zs_vdev_lock);
|
||||
+ (void) pthread_mutex_lock(&ztest_shared->zs_vdev_lock);
|
||||
|
||||
spa_config_enter(spa, RW_READER, FTAG);
|
||||
|
||||
@@ -1054,7 +1054,7 @@ ztest_vdev_attach_detach(ztest_args_t *z
|
||||
oldpath, newpath, replacing, error, expected_error);
|
||||
}
|
||||
|
||||
- (void) mutex_unlock(&ztest_shared->zs_vdev_lock);
|
||||
+ (void) pthread_mutex_unlock(&ztest_shared->zs_vdev_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -1071,7 +1071,7 @@ ztest_vdev_LUN_growth(ztest_args_t *za)
|
||||
size_t fsize;
|
||||
int fd;
|
||||
|
||||
- (void) mutex_lock(&ztest_shared->zs_vdev_lock);
|
||||
+ (void) pthread_mutex_lock(&ztest_shared->zs_vdev_lock);
|
||||
|
||||
/*
|
||||
* Pick a random leaf vdev.
|
||||
@@ -1102,7 +1102,7 @@ ztest_vdev_LUN_growth(ztest_args_t *za)
|
||||
(void) close(fd);
|
||||
}
|
||||
|
||||
- (void) mutex_unlock(&ztest_shared->zs_vdev_lock);
|
||||
+ (void) pthread_mutex_unlock(&ztest_shared->zs_vdev_lock);
|
||||
}
|
||||
|
||||
/* ARGSUSED */
|
||||
@@ -1198,7 +1198,7 @@ ztest_dmu_objset_create_destroy(ztest_ar
|
||||
uint64_t objects;
|
||||
ztest_replay_t zr;
|
||||
|
||||
- (void) rw_rdlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_rdlock(&ztest_shared->zs_name_lock);
|
||||
(void) snprintf(name, 100, "%s/%s_temp_%llu", za->za_pool, za->za_pool,
|
||||
(u_longlong_t)za->za_instance);
|
||||
|
||||
@@ -1242,7 +1242,7 @@ ztest_dmu_objset_create_destroy(ztest_ar
|
||||
if (error) {
|
||||
if (error == ENOSPC) {
|
||||
ztest_record_enospc("dmu_objset_create");
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
return;
|
||||
}
|
||||
fatal(0, "dmu_objset_create(%s) = %d", name, error);
|
||||
@@ -1321,7 +1321,7 @@ ztest_dmu_objset_create_destroy(ztest_ar
|
||||
if (error)
|
||||
fatal(0, "dmu_objset_destroy(%s) = %d", name, error);
|
||||
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -1335,7 +1335,7 @@ ztest_dmu_snapshot_create_destroy(ztest_
|
||||
char snapname[100];
|
||||
char osname[MAXNAMELEN];
|
||||
|
||||
- (void) rw_rdlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_rdlock(&ztest_shared->zs_name_lock);
|
||||
dmu_objset_name(os, osname);
|
||||
(void) snprintf(snapname, 100, "%s@%llu", osname,
|
||||
(u_longlong_t)za->za_instance);
|
||||
@@ -1348,7 +1348,7 @@ ztest_dmu_snapshot_create_destroy(ztest_
|
||||
ztest_record_enospc("dmu_take_snapshot");
|
||||
else if (error != 0 && error != EEXIST)
|
||||
fatal(0, "dmu_take_snapshot() = %d", error);
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
}
|
||||
|
||||
#define ZTEST_TRAVERSE_BLOCKS 1000
|
||||
@@ -1992,7 +1992,7 @@ ztest_dmu_write_parallel(ztest_args_t *z
|
||||
int bs = ZTEST_DIROBJ_BLOCKSIZE;
|
||||
int do_free = 0;
|
||||
uint64_t off, txg_how;
|
||||
- mutex_t *lp;
|
||||
+ pthread_mutex_t *lp;
|
||||
char osname[MAXNAMELEN];
|
||||
char iobuf[SPA_MAXBLOCKSIZE];
|
||||
blkptr_t blk = { 0 };
|
||||
@@ -2041,7 +2041,7 @@ ztest_dmu_write_parallel(ztest_args_t *z
|
||||
}
|
||||
|
||||
lp = &ztest_shared->zs_sync_lock[b];
|
||||
- (void) mutex_lock(lp);
|
||||
+ (void) pthread_mutex_lock(lp);
|
||||
|
||||
wbt->bt_objset = dmu_objset_id(os);
|
||||
wbt->bt_object = ZTEST_DIROBJ;
|
||||
@@ -2087,7 +2087,7 @@ ztest_dmu_write_parallel(ztest_args_t *z
|
||||
dmu_write(os, ZTEST_DIROBJ, off, btsize, wbt, tx);
|
||||
}
|
||||
|
||||
- (void) mutex_unlock(lp);
|
||||
+ (void) pthread_mutex_unlock(lp);
|
||||
|
||||
if (ztest_random(1000) == 0)
|
||||
(void) poll(NULL, 0, 1); /* open dn_notxholds window */
|
||||
@@ -2106,7 +2106,7 @@ ztest_dmu_write_parallel(ztest_args_t *z
|
||||
/*
|
||||
* dmu_sync() the block we just wrote.
|
||||
*/
|
||||
- (void) mutex_lock(lp);
|
||||
+ (void) pthread_mutex_lock(lp);
|
||||
|
||||
blkoff = P2ALIGN_TYPED(off, bs, uint64_t);
|
||||
error = dmu_buf_hold(os, ZTEST_DIROBJ, blkoff, FTAG, &db);
|
||||
@@ -2114,7 +2114,7 @@ ztest_dmu_write_parallel(ztest_args_t *z
|
||||
if (error) {
|
||||
dprintf("dmu_buf_hold(%s, %d, %llx) = %d\n",
|
||||
osname, ZTEST_DIROBJ, blkoff, error);
|
||||
- (void) mutex_unlock(lp);
|
||||
+ (void) pthread_mutex_unlock(lp);
|
||||
return;
|
||||
}
|
||||
blkoff = off - blkoff;
|
||||
@@ -2122,7 +2122,7 @@ ztest_dmu_write_parallel(ztest_args_t *z
|
||||
dmu_buf_rele(db, FTAG);
|
||||
za->za_dbuf = NULL;
|
||||
|
||||
- (void) mutex_unlock(lp);
|
||||
+ (void) pthread_mutex_unlock(lp);
|
||||
|
||||
if (error) {
|
||||
dprintf("dmu_sync(%s, %d, %llx) = %d\n",
|
||||
@@ -2502,7 +2502,7 @@ ztest_dsl_prop_get_set(ztest_args_t *za)
|
||||
char osname[MAXNAMELEN];
|
||||
int error;
|
||||
|
||||
- (void) rw_rdlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_rdlock(&ztest_shared->zs_name_lock);
|
||||
|
||||
dmu_objset_name(os, osname);
|
||||
|
||||
@@ -2541,7 +2541,7 @@ ztest_dsl_prop_get_set(ztest_args_t *za)
|
||||
}
|
||||
}
|
||||
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
}
|
||||
|
||||
static void
|
||||
@@ -2693,7 +2693,7 @@ ztest_spa_rename(ztest_args_t *za)
|
||||
int error;
|
||||
spa_t *spa;
|
||||
|
||||
- (void) rw_wrlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_wrlock(&ztest_shared->zs_name_lock);
|
||||
|
||||
oldname = za->za_pool;
|
||||
newname = umem_alloc(strlen(oldname) + 5, UMEM_NOFAIL);
|
||||
@@ -2745,7 +2745,7 @@ ztest_spa_rename(ztest_args_t *za)
|
||||
|
||||
umem_free(newname, strlen(newname) + 1);
|
||||
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
}
|
||||
|
||||
|
||||
@@ -3090,13 +3090,13 @@ ztest_run(char *pool)
|
||||
ztest_args_t *za;
|
||||
spa_t *spa;
|
||||
char name[100];
|
||||
- thread_t tid;
|
||||
+ pthread_t tid;
|
||||
|
||||
- (void) _mutex_init(&zs->zs_vdev_lock, USYNC_THREAD, NULL);
|
||||
- (void) rwlock_init(&zs->zs_name_lock, USYNC_THREAD, NULL);
|
||||
+ (void) pthread_mutex_init(&zs->zs_vdev_lock, NULL);
|
||||
+ (void) pthread_rwlock_init(&zs->zs_name_lock, NULL);
|
||||
|
||||
for (t = 0; t < ZTEST_SYNC_LOCKS; t++)
|
||||
- (void) _mutex_init(&zs->zs_sync_lock[t], USYNC_THREAD, NULL);
|
||||
+ (void) pthread_mutex_init(&zs->zs_sync_lock[t], NULL);
|
||||
|
||||
/*
|
||||
* Destroy one disk before we even start.
|
||||
@@ -3153,7 +3153,7 @@ ztest_run(char *pool)
|
||||
* start the thread before setting the zio_io_fail_shift, which
|
||||
* will indicate our failure rate.
|
||||
*/
|
||||
- error = thr_create(0, 0, ztest_suspend_monitor, NULL, THR_BOUND, &tid);
|
||||
+ error = pthread_create(&tid, NULL, ztest_suspend_monitor, NULL);
|
||||
if (error) {
|
||||
fatal(0, "can't create suspend monitor thread: error %d",
|
||||
t, error);
|
||||
@@ -3217,7 +3217,7 @@ ztest_run(char *pool)
|
||||
if (t < zopt_datasets) {
|
||||
ztest_replay_t zr;
|
||||
int test_future = FALSE;
|
||||
- (void) rw_rdlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_rdlock(&ztest_shared->zs_name_lock);
|
||||
(void) snprintf(name, 100, "%s/%s_%d", pool, pool, d);
|
||||
error = dmu_objset_create(name, DMU_OST_OTHER, NULL, 0,
|
||||
ztest_create_cb, NULL);
|
||||
@@ -3225,7 +3225,7 @@ ztest_run(char *pool)
|
||||
test_future = TRUE;
|
||||
} else if (error == ENOSPC) {
|
||||
zs->zs_enospc_count++;
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
break;
|
||||
} else if (error != 0) {
|
||||
fatal(0, "dmu_objset_create(%s) = %d",
|
||||
@@ -3236,7 +3236,7 @@ ztest_run(char *pool)
|
||||
if (error)
|
||||
fatal(0, "dmu_objset_open('%s') = %d",
|
||||
name, error);
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
if (test_future)
|
||||
ztest_dmu_check_future_leak(&za[t]);
|
||||
zr.zr_os = za[d].za_os;
|
||||
@@ -3245,15 +3245,15 @@ ztest_run(char *pool)
|
||||
za[d].za_zilog = zil_open(za[d].za_os, NULL);
|
||||
}
|
||||
|
||||
- error = thr_create(0, 0, ztest_thread, &za[t], THR_BOUND,
|
||||
- &za[t].za_thread);
|
||||
+ error = pthread_create(&za[t].za_thread, NULL, ztest_thread,
|
||||
+ &za[t]);
|
||||
if (error)
|
||||
fatal(0, "can't create thread %d: error %d",
|
||||
t, error);
|
||||
}
|
||||
|
||||
while (--t >= 0) {
|
||||
- error = thr_join(za[t].za_thread, NULL, NULL);
|
||||
+ error = pthread_join(za[t].za_thread, NULL);
|
||||
if (error)
|
||||
fatal(0, "thr_join(%d) = %d", t, error);
|
||||
if (za[t].za_th)
|
||||
@@ -3276,14 +3276,14 @@ ztest_run(char *pool)
|
||||
* If we had out-of-space errors, destroy a random objset.
|
||||
*/
|
||||
if (zs->zs_enospc_count != 0) {
|
||||
- (void) rw_rdlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_rdlock(&ztest_shared->zs_name_lock);
|
||||
d = (int)ztest_random(zopt_datasets);
|
||||
(void) snprintf(name, 100, "%s/%s_%d", pool, pool, d);
|
||||
if (zopt_verbose >= 3)
|
||||
(void) printf("Destroying %s to free up space\n", name);
|
||||
(void) dmu_objset_find(name, ztest_destroy_cb, &za[d],
|
||||
DS_FIND_SNAPSHOTS | DS_FIND_CHILDREN);
|
||||
- (void) rw_unlock(&ztest_shared->zs_name_lock);
|
||||
+ (void) pthread_rwlock_unlock(&ztest_shared->zs_name_lock);
|
||||
}
|
||||
|
||||
txg_wait_synced(spa_get_dsl(spa), 0);
|
||||
@@ -3301,7 +3301,7 @@ ztest_run(char *pool)
|
||||
mutex_enter(&spa->spa_zio_lock);
|
||||
cv_broadcast(&spa->spa_zio_cv);
|
||||
mutex_exit(&spa->spa_zio_lock);
|
||||
- error = thr_join(tid, NULL, NULL);
|
||||
+ error = pthread_join(tid, NULL);
|
||||
if (error)
|
||||
fatal(0, "thr_join(%d) = %d", tid, error);
|
||||
|
||||
Index: zfs+chaos4/lib/libuutil/uu_misc.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libuutil/uu_misc.c
|
||||
+++ zfs+chaos4/lib/libuutil/uu_misc.c
|
||||
@@ -37,7 +37,6 @@
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <sys/debug.h>
|
||||
-#include <thread.h>
|
||||
#include <unistd.h>
|
||||
|
||||
#if !defined(TEXT_DOMAIN)
|
||||
@@ -70,11 +69,12 @@ static va_list uu_panic_args;
|
||||
static pthread_t uu_panic_thread;
|
||||
|
||||
static uint32_t _uu_main_error;
|
||||
+static __thread int _uu_main_thread = 0;
|
||||
|
||||
void
|
||||
uu_set_error(uint_t code)
|
||||
{
|
||||
- if (thr_main() != 0) {
|
||||
+ if (_uu_main_thread) {
|
||||
_uu_main_error = code;
|
||||
return;
|
||||
}
|
||||
@@ -103,7 +103,7 @@ uu_set_error(uint_t code)
|
||||
uint32_t
|
||||
uu_error(void)
|
||||
{
|
||||
- if (thr_main() != 0)
|
||||
+ if (_uu_main_thread)
|
||||
return (_uu_main_error);
|
||||
|
||||
if (uu_error_key_setup < 0) /* can't happen? */
|
||||
@@ -255,5 +255,6 @@ uu_release_child(void)
|
||||
static void
|
||||
uu_init(void)
|
||||
{
|
||||
+ _uu_main_thread = 1;
|
||||
(void) pthread_atfork(uu_lockup, uu_release, uu_release_child);
|
||||
}
|
||||
Index: zfs+chaos4/lib/libzfscommon/include/sys/zfs_context_user.h
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfscommon/include/sys/zfs_context_user.h
|
||||
+++ zfs+chaos4/lib/libzfscommon/include/sys/zfs_context_user.h
|
||||
@@ -52,8 +52,7 @@ extern "C" {
|
||||
#include <errno.h>
|
||||
#include <string.h>
|
||||
#include <strings.h>
|
||||
-#include <synch.h>
|
||||
-#include <thread.h>
|
||||
+#include <pthread.h>
|
||||
#include <assert.h>
|
||||
#include <alloca.h>
|
||||
#include <umem.h>
|
||||
@@ -191,13 +190,15 @@ _NOTE(CONSTCOND) } while (0)
|
||||
/*
|
||||
* Threads
|
||||
*/
|
||||
-#define curthread ((void *)(uintptr_t)thr_self())
|
||||
+
|
||||
+/* XXX: not portable */
|
||||
+#define curthread ((void *)(uintptr_t)pthread_self())
|
||||
|
||||
typedef struct kthread kthread_t;
|
||||
|
||||
#define thread_create(stk, stksize, func, arg, len, pp, state, pri) \
|
||||
zk_thread_create(func, arg)
|
||||
-#define thread_exit() thr_exit(NULL)
|
||||
+#define thread_exit() pthread_exit(NULL)
|
||||
|
||||
extern kthread_t *zk_thread_create(void (*func)(), void *arg);
|
||||
|
||||
@@ -207,28 +208,18 @@ extern kthread_t *zk_thread_create(void
|
||||
/*
|
||||
* Mutexes
|
||||
*/
|
||||
+#define MTX_MAGIC 0x9522f51362a6e326ull
|
||||
typedef struct kmutex {
|
||||
void *m_owner;
|
||||
- boolean_t initialized;
|
||||
- mutex_t m_lock;
|
||||
+ uint64_t m_magic;
|
||||
+ pthread_mutex_t m_lock;
|
||||
} kmutex_t;
|
||||
|
||||
-#define MUTEX_DEFAULT USYNC_THREAD
|
||||
-#undef MUTEX_HELD
|
||||
-#define MUTEX_HELD(m) _mutex_held(&(m)->m_lock)
|
||||
-
|
||||
-/*
|
||||
- * Argh -- we have to get cheesy here because the kernel and userland
|
||||
- * have different signatures for the same routine.
|
||||
- */
|
||||
-extern int _mutex_init(mutex_t *mp, int type, void *arg);
|
||||
-extern int _mutex_destroy(mutex_t *mp);
|
||||
-
|
||||
-#define mutex_init(mp, b, c, d) zmutex_init((kmutex_t *)(mp))
|
||||
-#define mutex_destroy(mp) zmutex_destroy((kmutex_t *)(mp))
|
||||
+#define MUTEX_DEFAULT 0
|
||||
+#define MUTEX_HELD(m) ((m)->m_owner == curthread)
|
||||
|
||||
-extern void zmutex_init(kmutex_t *mp);
|
||||
-extern void zmutex_destroy(kmutex_t *mp);
|
||||
+extern void mutex_init(kmutex_t *mp, char *name, int type, void *cookie);
|
||||
+extern void mutex_destroy(kmutex_t *mp);
|
||||
extern void mutex_enter(kmutex_t *mp);
|
||||
extern void mutex_exit(kmutex_t *mp);
|
||||
extern int mutex_tryenter(kmutex_t *mp);
|
||||
@@ -237,23 +228,24 @@ extern void *mutex_owner(kmutex_t *mp);
|
||||
/*
|
||||
* RW locks
|
||||
*/
|
||||
+#define RW_MAGIC 0x4d31fb123648e78aull
|
||||
typedef struct krwlock {
|
||||
- void *rw_owner;
|
||||
- boolean_t initialized;
|
||||
- rwlock_t rw_lock;
|
||||
+ void *rw_owner;
|
||||
+ void *rw_wr_owner;
|
||||
+ uint64_t rw_magic;
|
||||
+ pthread_rwlock_t rw_lock;
|
||||
+ uint_t rw_readers;
|
||||
} krwlock_t;
|
||||
|
||||
typedef int krw_t;
|
||||
|
||||
#define RW_READER 0
|
||||
#define RW_WRITER 1
|
||||
-#define RW_DEFAULT USYNC_THREAD
|
||||
-
|
||||
-#undef RW_READ_HELD
|
||||
-#define RW_READ_HELD(x) _rw_read_held(&(x)->rw_lock)
|
||||
+#define RW_DEFAULT 0
|
||||
|
||||
-#undef RW_WRITE_HELD
|
||||
-#define RW_WRITE_HELD(x) _rw_write_held(&(x)->rw_lock)
|
||||
+#define RW_READ_HELD(x) ((x)->rw_readers > 0)
|
||||
+#define RW_WRITE_HELD(x) ((x)->rw_wr_owner == curthread)
|
||||
+#define RW_LOCK_HELD(x) (RW_READ_HELD(x) || RW_WRITE_HELD(x))
|
||||
|
||||
extern void rw_init(krwlock_t *rwlp, char *name, int type, void *arg);
|
||||
extern void rw_destroy(krwlock_t *rwlp);
|
||||
@@ -271,9 +263,13 @@ extern gid_t *crgetgroups(cred_t *cr);
|
||||
/*
|
||||
* Condition variables
|
||||
*/
|
||||
-typedef cond_t kcondvar_t;
|
||||
+#define CV_MAGIC 0xd31ea9a83b1b30c4ull
|
||||
+typedef struct kcondvar {
|
||||
+ uint64_t cv_magic;
|
||||
+ pthread_cond_t cv;
|
||||
+} kcondvar_t;
|
||||
|
||||
-#define CV_DEFAULT USYNC_THREAD
|
||||
+#define CV_DEFAULT 0
|
||||
|
||||
extern void cv_init(kcondvar_t *cv, char *name, int type, void *arg);
|
||||
extern void cv_destroy(kcondvar_t *cv);
|
||||
@@ -444,7 +440,8 @@ extern void delay(clock_t ticks);
|
||||
#define minclsyspri 60
|
||||
#define maxclsyspri 99
|
||||
|
||||
-#define CPU_SEQID (thr_self() & (max_ncpus - 1))
|
||||
+/* XXX: not portable */
|
||||
+#define CPU_SEQID (pthread_self() & (max_ncpus - 1))
|
||||
|
||||
#define kcred NULL
|
||||
#define CRED() NULL
|
||||
Index: zfs+chaos4/lib/libzpool/kernel.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/kernel.c
|
||||
+++ zfs+chaos4/lib/libzpool/kernel.c
|
||||
@@ -38,6 +38,7 @@
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/zmod.h>
|
||||
#include <sys/utsname.h>
|
||||
+#include <sys/time.h>
|
||||
|
||||
/*
|
||||
* Emulation of kernel services in userland.
|
||||
@@ -60,11 +61,15 @@ struct utsname utsname = {
|
||||
kthread_t *
|
||||
zk_thread_create(void (*func)(), void *arg)
|
||||
{
|
||||
- thread_t tid;
|
||||
+ pthread_t tid;
|
||||
|
||||
- VERIFY(thr_create(0, 0, (void *(*)(void *))func, arg, THR_DETACHED,
|
||||
- &tid) == 0);
|
||||
+ pthread_attr_t attr;
|
||||
+ VERIFY(pthread_attr_init(&attr) == 0);
|
||||
+ VERIFY(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) == 0);
|
||||
|
||||
+ VERIFY(pthread_create(&tid, &attr, (void *(*)(void *))func, arg) == 0);
|
||||
+
|
||||
+ /* XXX: not portable */
|
||||
return ((void *)(uintptr_t)tid);
|
||||
}
|
||||
|
||||
@@ -97,30 +102,37 @@ kstat_delete(kstat_t *ksp)
|
||||
* =========================================================================
|
||||
*/
|
||||
void
|
||||
-zmutex_init(kmutex_t *mp)
|
||||
+mutex_init(kmutex_t *mp, char *name, int type, void *cookie)
|
||||
{
|
||||
+ ASSERT(type == MUTEX_DEFAULT);
|
||||
+ ASSERT(cookie == NULL);
|
||||
+
|
||||
+#ifdef IM_FEELING_LUCKY
|
||||
+ ASSERT(mp->m_magic != MTX_MAGIC);
|
||||
+#endif
|
||||
+
|
||||
mp->m_owner = NULL;
|
||||
- mp->initialized = B_TRUE;
|
||||
- (void) _mutex_init(&mp->m_lock, USYNC_THREAD, NULL);
|
||||
+ mp->m_magic = MTX_MAGIC;
|
||||
+ VERIFY3S(pthread_mutex_init(&mp->m_lock, NULL), ==, 0);
|
||||
}
|
||||
|
||||
void
|
||||
-zmutex_destroy(kmutex_t *mp)
|
||||
+mutex_destroy(kmutex_t *mp)
|
||||
{
|
||||
- ASSERT(mp->initialized == B_TRUE);
|
||||
+ ASSERT(mp->m_magic == MTX_MAGIC);
|
||||
ASSERT(mp->m_owner == NULL);
|
||||
- (void) _mutex_destroy(&(mp)->m_lock);
|
||||
+ VERIFY3S(pthread_mutex_destroy(&(mp)->m_lock), ==, 0);
|
||||
mp->m_owner = (void *)-1UL;
|
||||
- mp->initialized = B_FALSE;
|
||||
+ mp->m_magic = 0;
|
||||
}
|
||||
|
||||
void
|
||||
mutex_enter(kmutex_t *mp)
|
||||
{
|
||||
- ASSERT(mp->initialized == B_TRUE);
|
||||
+ ASSERT(mp->m_magic == MTX_MAGIC);
|
||||
ASSERT(mp->m_owner != (void *)-1UL);
|
||||
ASSERT(mp->m_owner != curthread);
|
||||
- VERIFY(mutex_lock(&mp->m_lock) == 0);
|
||||
+ VERIFY3S(pthread_mutex_lock(&mp->m_lock), ==, 0);
|
||||
ASSERT(mp->m_owner == NULL);
|
||||
mp->m_owner = curthread;
|
||||
}
|
||||
@@ -128,9 +140,9 @@ mutex_enter(kmutex_t *mp)
|
||||
int
|
||||
mutex_tryenter(kmutex_t *mp)
|
||||
{
|
||||
- ASSERT(mp->initialized == B_TRUE);
|
||||
+ ASSERT(mp->m_magic == MTX_MAGIC);
|
||||
ASSERT(mp->m_owner != (void *)-1UL);
|
||||
- if (0 == mutex_trylock(&mp->m_lock)) {
|
||||
+ if (0 == pthread_mutex_trylock(&mp->m_lock)) {
|
||||
ASSERT(mp->m_owner == NULL);
|
||||
mp->m_owner = curthread;
|
||||
return (1);
|
||||
@@ -142,16 +154,16 @@ mutex_tryenter(kmutex_t *mp)
|
||||
void
|
||||
mutex_exit(kmutex_t *mp)
|
||||
{
|
||||
- ASSERT(mp->initialized == B_TRUE);
|
||||
+ ASSERT(mp->m_magic == MTX_MAGIC);
|
||||
ASSERT(mutex_owner(mp) == curthread);
|
||||
mp->m_owner = NULL;
|
||||
- VERIFY(mutex_unlock(&mp->m_lock) == 0);
|
||||
+ VERIFY3S(pthread_mutex_unlock(&mp->m_lock), ==, 0);
|
||||
}
|
||||
|
||||
void *
|
||||
mutex_owner(kmutex_t *mp)
|
||||
{
|
||||
- ASSERT(mp->initialized == B_TRUE);
|
||||
+ ASSERT(mp->m_magic == MTX_MAGIC);
|
||||
return (mp->m_owner);
|
||||
}
|
||||
|
||||
@@ -164,31 +176,48 @@ mutex_owner(kmutex_t *mp)
|
||||
void
|
||||
rw_init(krwlock_t *rwlp, char *name, int type, void *arg)
|
||||
{
|
||||
- rwlock_init(&rwlp->rw_lock, USYNC_THREAD, NULL);
|
||||
+ ASSERT(type == RW_DEFAULT);
|
||||
+ ASSERT(arg == NULL);
|
||||
+
|
||||
+#ifdef IM_FEELING_LUCKY
|
||||
+ ASSERT(rwlp->rw_magic != RW_MAGIC);
|
||||
+#endif
|
||||
+
|
||||
+ VERIFY3S(pthread_rwlock_init(&rwlp->rw_lock, NULL), ==, 0);
|
||||
rwlp->rw_owner = NULL;
|
||||
- rwlp->initialized = B_TRUE;
|
||||
+ rwlp->rw_wr_owner = NULL;
|
||||
+ rwlp->rw_readers = 0;
|
||||
+ rwlp->rw_magic = RW_MAGIC;
|
||||
}
|
||||
|
||||
void
|
||||
rw_destroy(krwlock_t *rwlp)
|
||||
{
|
||||
- rwlock_destroy(&rwlp->rw_lock);
|
||||
- rwlp->rw_owner = (void *)-1UL;
|
||||
- rwlp->initialized = B_FALSE;
|
||||
+ ASSERT(rwlp->rw_magic == RW_MAGIC);
|
||||
+
|
||||
+ VERIFY3S(pthread_rwlock_destroy(&rwlp->rw_lock), ==, 0);
|
||||
+ rwlp->rw_magic = 0;
|
||||
}
|
||||
|
||||
void
|
||||
rw_enter(krwlock_t *rwlp, krw_t rw)
|
||||
{
|
||||
- ASSERT(!RW_LOCK_HELD(rwlp));
|
||||
- ASSERT(rwlp->initialized == B_TRUE);
|
||||
- ASSERT(rwlp->rw_owner != (void *)-1UL);
|
||||
+ ASSERT(rwlp->rw_magic == RW_MAGIC);
|
||||
ASSERT(rwlp->rw_owner != curthread);
|
||||
+ ASSERT(rwlp->rw_wr_owner != curthread);
|
||||
|
||||
- if (rw == RW_READER)
|
||||
- (void) rw_rdlock(&rwlp->rw_lock);
|
||||
- else
|
||||
- (void) rw_wrlock(&rwlp->rw_lock);
|
||||
+ if (rw == RW_READER) {
|
||||
+ VERIFY3S(pthread_rwlock_rdlock(&rwlp->rw_lock), ==, 0);
|
||||
+ ASSERT(rwlp->rw_wr_owner == NULL);
|
||||
+
|
||||
+ atomic_inc_uint(&rwlp->rw_readers);
|
||||
+ } else {
|
||||
+ VERIFY3S(pthread_rwlock_wrlock(&rwlp->rw_lock), ==, 0);
|
||||
+ ASSERT(rwlp->rw_wr_owner == NULL);
|
||||
+ ASSERT3U(rwlp->rw_readers, ==, 0);
|
||||
+
|
||||
+ rwlp->rw_wr_owner = curthread;
|
||||
+ }
|
||||
|
||||
rwlp->rw_owner = curthread;
|
||||
}
|
||||
@@ -196,11 +225,16 @@ rw_enter(krwlock_t *rwlp, krw_t rw)
|
||||
void
|
||||
rw_exit(krwlock_t *rwlp)
|
||||
{
|
||||
- ASSERT(rwlp->initialized == B_TRUE);
|
||||
- ASSERT(rwlp->rw_owner != (void *)-1UL);
|
||||
+ ASSERT(rwlp->rw_magic == RW_MAGIC);
|
||||
+ ASSERT(RW_LOCK_HELD(rwlp));
|
||||
+
|
||||
+ if (RW_READ_HELD(rwlp))
|
||||
+ atomic_dec_uint(&rwlp->rw_readers);
|
||||
+ else
|
||||
+ rwlp->rw_wr_owner = NULL;
|
||||
|
||||
rwlp->rw_owner = NULL;
|
||||
- (void) rw_unlock(&rwlp->rw_lock);
|
||||
+ VERIFY3S(pthread_rwlock_unlock(&rwlp->rw_lock), ==, 0);
|
||||
}
|
||||
|
||||
int
|
||||
@@ -208,19 +242,29 @@ rw_tryenter(krwlock_t *rwlp, krw_t rw)
|
||||
{
|
||||
int rv;
|
||||
|
||||
- ASSERT(rwlp->initialized == B_TRUE);
|
||||
- ASSERT(rwlp->rw_owner != (void *)-1UL);
|
||||
+ ASSERT(rwlp->rw_magic == RW_MAGIC);
|
||||
|
||||
if (rw == RW_READER)
|
||||
- rv = rw_tryrdlock(&rwlp->rw_lock);
|
||||
+ rv = pthread_rwlock_tryrdlock(&rwlp->rw_lock);
|
||||
else
|
||||
- rv = rw_trywrlock(&rwlp->rw_lock);
|
||||
+ rv = pthread_rwlock_trywrlock(&rwlp->rw_lock);
|
||||
|
||||
if (rv == 0) {
|
||||
+ ASSERT(rwlp->rw_wr_owner == NULL);
|
||||
+
|
||||
+ if (rw == RW_READER)
|
||||
+ atomic_inc_uint(&rwlp->rw_readers);
|
||||
+ else {
|
||||
+ ASSERT3U(rwlp->rw_readers, ==, 0);
|
||||
+ rwlp->rw_wr_owner = curthread;
|
||||
+ }
|
||||
+
|
||||
rwlp->rw_owner = curthread;
|
||||
return (1);
|
||||
}
|
||||
|
||||
+ VERIFY3S(rv, ==, EBUSY);
|
||||
+
|
||||
return (0);
|
||||
}
|
||||
|
||||
@@ -228,8 +272,7 @@ rw_tryenter(krwlock_t *rwlp, krw_t rw)
|
||||
int
|
||||
rw_tryupgrade(krwlock_t *rwlp)
|
||||
{
|
||||
- ASSERT(rwlp->initialized == B_TRUE);
|
||||
- ASSERT(rwlp->rw_owner != (void *)-1UL);
|
||||
+ ASSERT(rwlp->rw_magic == RW_MAGIC);
|
||||
|
||||
return (0);
|
||||
}
|
||||
@@ -243,22 +286,34 @@ rw_tryupgrade(krwlock_t *rwlp)
|
||||
void
|
||||
cv_init(kcondvar_t *cv, char *name, int type, void *arg)
|
||||
{
|
||||
- VERIFY(cond_init(cv, type, NULL) == 0);
|
||||
+ ASSERT(type == CV_DEFAULT);
|
||||
+
|
||||
+#ifdef IM_FEELING_LUCKY
|
||||
+ ASSERT(cv->cv_magic != CV_MAGIC);
|
||||
+#endif
|
||||
+
|
||||
+ cv->cv_magic = CV_MAGIC;
|
||||
+
|
||||
+ VERIFY3S(pthread_cond_init(&cv->cv, NULL), ==, 0);
|
||||
}
|
||||
|
||||
void
|
||||
cv_destroy(kcondvar_t *cv)
|
||||
{
|
||||
- VERIFY(cond_destroy(cv) == 0);
|
||||
+ ASSERT(cv->cv_magic == CV_MAGIC);
|
||||
+ VERIFY3S(pthread_cond_destroy(&cv->cv), ==, 0);
|
||||
+ cv->cv_magic = 0;
|
||||
}
|
||||
|
||||
void
|
||||
cv_wait(kcondvar_t *cv, kmutex_t *mp)
|
||||
{
|
||||
+ ASSERT(cv->cv_magic == CV_MAGIC);
|
||||
ASSERT(mutex_owner(mp) == curthread);
|
||||
mp->m_owner = NULL;
|
||||
- int ret = cond_wait(cv, &mp->m_lock);
|
||||
- VERIFY(ret == 0 || ret == EINTR);
|
||||
+ int ret = pthread_cond_wait(&cv->cv, &mp->m_lock);
|
||||
+ if (ret != 0)
|
||||
+ VERIFY3S(ret, ==, EINTR);
|
||||
mp->m_owner = curthread;
|
||||
}
|
||||
|
||||
@@ -266,29 +321,38 @@ clock_t
|
||||
cv_timedwait(kcondvar_t *cv, kmutex_t *mp, clock_t abstime)
|
||||
{
|
||||
int error;
|
||||
+ struct timeval tv;
|
||||
timestruc_t ts;
|
||||
clock_t delta;
|
||||
|
||||
+ ASSERT(cv->cv_magic == CV_MAGIC);
|
||||
+
|
||||
top:
|
||||
delta = abstime - lbolt;
|
||||
if (delta <= 0)
|
||||
return (-1);
|
||||
|
||||
- ts.tv_sec = delta / hz;
|
||||
- ts.tv_nsec = (delta % hz) * (NANOSEC / hz);
|
||||
+ VERIFY(gettimeofday(&tv, NULL) == 0);
|
||||
+
|
||||
+ ts.tv_sec = tv.tv_sec + delta / hz;
|
||||
+ ts.tv_nsec = tv.tv_usec * 1000 + (delta % hz) * (NANOSEC / hz);
|
||||
+ if (ts.tv_nsec >= NANOSEC) {
|
||||
+ ts.tv_sec++;
|
||||
+ ts.tv_nsec -= NANOSEC;
|
||||
+ }
|
||||
|
||||
ASSERT(mutex_owner(mp) == curthread);
|
||||
mp->m_owner = NULL;
|
||||
- error = cond_reltimedwait(cv, &mp->m_lock, &ts);
|
||||
+ error = pthread_cond_timedwait(&cv->cv, &mp->m_lock, &ts);
|
||||
mp->m_owner = curthread;
|
||||
|
||||
- if (error == ETIME)
|
||||
+ if (error == ETIMEDOUT)
|
||||
return (-1);
|
||||
|
||||
if (error == EINTR)
|
||||
goto top;
|
||||
|
||||
- ASSERT(error == 0);
|
||||
+ VERIFY3S(error, ==, 0);
|
||||
|
||||
return (1);
|
||||
}
|
||||
@@ -296,13 +360,15 @@ top:
|
||||
void
|
||||
cv_signal(kcondvar_t *cv)
|
||||
{
|
||||
- VERIFY(cond_signal(cv) == 0);
|
||||
+ ASSERT(cv->cv_magic == CV_MAGIC);
|
||||
+ VERIFY3S(pthread_cond_signal(&cv->cv), ==, 0);
|
||||
}
|
||||
|
||||
void
|
||||
cv_broadcast(kcondvar_t *cv)
|
||||
{
|
||||
- VERIFY(cond_broadcast(cv) == 0);
|
||||
+ ASSERT(cv->cv_magic == CV_MAGIC);
|
||||
+ VERIFY3S(pthread_cond_broadcast(&cv->cv), ==, 0);
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -549,11 +615,11 @@ __dprintf(const char *file, const char *
|
||||
dprintf_find_string(func)) {
|
||||
/* Print out just the function name if requested */
|
||||
flockfile(stdout);
|
||||
- /* XXX: the following printf may not be portable */
|
||||
+ /* XXX: the following 2 printfs may not be portable */
|
||||
if (dprintf_find_string("pid"))
|
||||
(void) printf("%llu ", (u_longlong_t) getpid());
|
||||
if (dprintf_find_string("tid"))
|
||||
- (void) printf("%u ", (uint_t) thr_self());
|
||||
+ (void) printf("%u ", (uint_t) pthread_self());
|
||||
if (dprintf_find_string("cpu"))
|
||||
(void) printf("%u ", getcpuid());
|
||||
if (dprintf_find_string("time"))
|
||||
Index: zfs+chaos4/lib/libzpool/taskq.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/taskq.c
|
||||
+++ zfs+chaos4/lib/libzpool/taskq.c
|
||||
@@ -43,7 +43,7 @@ struct taskq {
|
||||
krwlock_t tq_threadlock;
|
||||
kcondvar_t tq_dispatch_cv;
|
||||
kcondvar_t tq_wait_cv;
|
||||
- thread_t *tq_threadlist;
|
||||
+ pthread_t *tq_threadlist;
|
||||
int tq_flags;
|
||||
int tq_active;
|
||||
int tq_nthreads;
|
||||
@@ -186,7 +186,7 @@ taskq_create(const char *name, int nthre
|
||||
tq->tq_maxalloc = maxalloc;
|
||||
tq->tq_task.task_next = &tq->tq_task;
|
||||
tq->tq_task.task_prev = &tq->tq_task;
|
||||
- tq->tq_threadlist = kmem_alloc(nthreads * sizeof (thread_t), KM_SLEEP);
|
||||
+ tq->tq_threadlist = kmem_alloc(nthreads * sizeof (pthread_t), KM_SLEEP);
|
||||
|
||||
if (flags & TASKQ_PREPOPULATE) {
|
||||
mutex_enter(&tq->tq_lock);
|
||||
@@ -196,8 +196,8 @@ taskq_create(const char *name, int nthre
|
||||
}
|
||||
|
||||
for (t = 0; t < nthreads; t++)
|
||||
- VERIFY(thr_create(0, 0, taskq_thread,
|
||||
- tq, THR_BOUND, &tq->tq_threadlist[t]) == 0);
|
||||
+ VERIFY(pthread_create(&tq->tq_threadlist[t],
|
||||
+ NULL, taskq_thread, tq) == 0);
|
||||
|
||||
return (tq);
|
||||
}
|
||||
@@ -227,9 +227,9 @@ taskq_destroy(taskq_t *tq)
|
||||
mutex_exit(&tq->tq_lock);
|
||||
|
||||
for (t = 0; t < nthreads; t++)
|
||||
- VERIFY(thr_join(tq->tq_threadlist[t], NULL, NULL) == 0);
|
||||
+ VERIFY(pthread_join(tq->tq_threadlist[t], NULL) == 0);
|
||||
|
||||
- kmem_free(tq->tq_threadlist, nthreads * sizeof (thread_t));
|
||||
+ kmem_free(tq->tq_threadlist, nthreads * sizeof (pthread_t));
|
||||
|
||||
rw_destroy(&tq->tq_threadlock);
|
||||
mutex_destroy(&tq->tq_lock);
|
||||
@@ -248,7 +248,7 @@ taskq_member(taskq_t *tq, void *t)
|
||||
return (1);
|
||||
|
||||
for (i = 0; i < tq->tq_nthreads; i++)
|
||||
- if (tq->tq_threadlist[i] == (thread_t)(uintptr_t)t)
|
||||
+ if (tq->tq_threadlist[i] == (pthread_t)(uintptr_t)t)
|
||||
return (1);
|
||||
|
||||
return (0);
|
|
@ -0,0 +1,115 @@
|
|||
Add a ZAP API to move a ZAP cursor to a given key.
|
||||
|
||||
Index: zfs+chaos4/lib/libzfscommon/include/sys/zap.h
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfscommon/include/sys/zap.h
|
||||
+++ zfs+chaos4/lib/libzfscommon/include/sys/zap.h
|
||||
@@ -302,6 +302,11 @@ void zap_cursor_advance(zap_cursor_t *zc
|
||||
uint64_t zap_cursor_serialize(zap_cursor_t *zc);
|
||||
|
||||
/*
|
||||
+ * Advance the cursor to the attribute having the key.
|
||||
+ */
|
||||
+int zap_cursor_move_to_key(zap_cursor_t *zc, const char *name, matchtype_t mt);
|
||||
+
|
||||
+/*
|
||||
* Initialize a zap cursor pointing to the position recorded by
|
||||
* zap_cursor_serialize (in the "serialized" argument). You can also
|
||||
* use a "serialized" argument of 0 to start at the beginning of the
|
||||
Index: zfs+chaos4/lib/libzfscommon/include/sys/zap_impl.h
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzfscommon/include/sys/zap_impl.h
|
||||
+++ zfs+chaos4/lib/libzfscommon/include/sys/zap_impl.h
|
||||
@@ -210,6 +210,7 @@ int fzap_add_cd(zap_name_t *zn,
|
||||
uint64_t integer_size, uint64_t num_integers,
|
||||
const void *val, uint32_t cd, dmu_tx_t *tx);
|
||||
void fzap_upgrade(zap_t *zap, dmu_tx_t *tx);
|
||||
+int fzap_cursor_move_to_key(zap_cursor_t *zc, zap_name_t *zn);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
Index: zfs+chaos4/lib/libzpool/zap.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/zap.c
|
||||
+++ zfs+chaos4/lib/libzpool/zap.c
|
||||
@@ -1029,6 +1029,30 @@ zap_stats_ptrtbl(zap_t *zap, uint64_t *t
|
||||
}
|
||||
}
|
||||
|
||||
+int fzap_cursor_move_to_key(zap_cursor_t *zc, zap_name_t *zn)
|
||||
+{
|
||||
+ int err;
|
||||
+ zap_leaf_t *l;
|
||||
+ zap_entry_handle_t zeh;
|
||||
+ uint64_t hash;
|
||||
+
|
||||
+ if (zn->zn_name_orij && strlen(zn->zn_name_orij) > ZAP_MAXNAMELEN)
|
||||
+ return (E2BIG);
|
||||
+
|
||||
+ err = zap_deref_leaf(zc->zc_zap, zn->zn_hash, NULL, RW_READER, &l);
|
||||
+ if (err != 0)
|
||||
+ return (err);
|
||||
+
|
||||
+ err = zap_leaf_lookup(l, zn, &zeh);
|
||||
+ if (err != 0)
|
||||
+ return (err);
|
||||
+
|
||||
+ zc->zc_leaf = l;
|
||||
+ zc->zc_hash = zeh.zeh_hash;
|
||||
+ zc->zc_cd = zeh.zeh_cd;
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
void
|
||||
fzap_get_stats(zap_t *zap, zap_stats_t *zs)
|
||||
{
|
||||
Index: zfs+chaos4/lib/libzpool/zap_micro.c
|
||||
===================================================================
|
||||
--- zfs+chaos4.orig/lib/libzpool/zap_micro.c
|
||||
+++ zfs+chaos4/lib/libzpool/zap_micro.c
|
||||
@@ -1045,6 +1045,45 @@ zap_cursor_advance(zap_cursor_t *zc)
|
||||
}
|
||||
}
|
||||
|
||||
+int zap_cursor_move_to_key(zap_cursor_t *zc, const char *name, matchtype_t mt)
|
||||
+{
|
||||
+ int err = 0;
|
||||
+ mzap_ent_t *mze;
|
||||
+ zap_name_t *zn;
|
||||
+
|
||||
+ if (zc->zc_zap == NULL) {
|
||||
+ err = zap_lockdir(zc->zc_objset, zc->zc_zapobj, NULL,
|
||||
+ RW_READER, TRUE, FALSE, &zc->zc_zap);
|
||||
+ if (err)
|
||||
+ return (err);
|
||||
+ } else {
|
||||
+ rw_enter(&zc->zc_zap->zap_rwlock, RW_READER);
|
||||
+ }
|
||||
+
|
||||
+ zn = zap_name_alloc(zc->zc_zap, name, mt);
|
||||
+ if (zn == NULL) {
|
||||
+ rw_exit(&zc->zc_zap->zap_rwlock);
|
||||
+ return (ENOTSUP);
|
||||
+ }
|
||||
+
|
||||
+ if (!zc->zc_zap->zap_ismicro) {
|
||||
+ err = fzap_cursor_move_to_key(zc, zn);
|
||||
+ } else {
|
||||
+ mze = mze_find(zn);
|
||||
+ if (mze == NULL) {
|
||||
+ err = (ENOENT);
|
||||
+ goto out;
|
||||
+ }
|
||||
+ zc->zc_hash = mze->mze_hash;
|
||||
+ zc->zc_cd = mze->mze_phys.mze_cd;
|
||||
+ }
|
||||
+
|
||||
+out:
|
||||
+ zap_name_free(zn);
|
||||
+ rw_exit(&zc->zc_zap->zap_rwlock);
|
||||
+ return (err);
|
||||
+}
|
||||
+
|
||||
int
|
||||
zap_get_stats(objset_t *os, uint64_t zapobj, zap_stats_t *zs)
|
||||
{
|
|
@ -0,0 +1,8 @@
|
|||
EXTRA_DIST = check.sh create-zpool.sh load-zfs.sh unload-zfs.sh
|
||||
EXTRA_DIST += profile-kpios-disk.sh profile-kpios-pids.sh
|
||||
EXTRA_DIST += profile-kpios-post.sh profile-kpios-pre.sh profile-kpios.sh
|
||||
EXTRA_DIST += survey.sh update-zfs.sh zpios-jbod.sh zpios.sh
|
||||
|
||||
check:
|
||||
./check.sh
|
||||
|
|
@ -0,0 +1,17 @@
|
|||
#!/bin/bash
|
||||
|
||||
prog=check.sh
|
||||
|
||||
die() {
|
||||
echo "${prog}: $1" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
if [ $(id -u) != 0 ]; then
|
||||
die "Must run as root"
|
||||
fi
|
||||
|
||||
./load-zfs.sh || die ""
|
||||
./unload-zfs.sh || die ""
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,42 @@
|
|||
#!/bin/bash
|
||||
|
||||
prog=create-zpool.sh
|
||||
. ../.script-config
|
||||
|
||||
# Single disk ilc dev nodes
|
||||
DEVICES="/dev/sda"
|
||||
|
||||
# All disks in a Thumper config
|
||||
#DEVICES="/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf \
|
||||
# /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl \
|
||||
# /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr \
|
||||
# /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx \
|
||||
# /dev/sdy /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad \
|
||||
# /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj \
|
||||
# /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap \
|
||||
# /dev/sdaq /dev/sdar /dev/sdas /dev/sdat /dev/sdau /dev/sdav"
|
||||
|
||||
# Sun style disk in Thumper config
|
||||
#DEVICES="/dev/sda /dev/sdb /dev/sdc \
|
||||
# /dev/sdi /dev/sdj /dev/sdk \
|
||||
# /dev/sdr /dev/sds /dev/sdt \
|
||||
# /dev/sdz /dev/sdaa /dev/sdab"
|
||||
|
||||
# Promise JBOD config (ilc23)
|
||||
#DEVICES="/dev/sdb /dev/sdc /dev/sdd \
|
||||
# /dev/sde /dev/sdf /dev/sdg \
|
||||
# /dev/sdh /dev/sdi /dev/sdj \
|
||||
# /dev/sdk /dev/sdl /dev/sdm"
|
||||
|
||||
echo
|
||||
echo "zpool create lustre <devices>"
|
||||
${CMDDIR}/zpool/zpool create -F lustre ${DEVICES}
|
||||
|
||||
echo
|
||||
echo "zpool list"
|
||||
${CMDDIR}/zpool/zpool list
|
||||
|
||||
echo
|
||||
echo "zpool status lustre"
|
||||
${CMDDIR}/zpool/zpool status lustre
|
||||
|
|
@ -0,0 +1,58 @@
|
|||
#!/bin/bash
|
||||
|
||||
prog=load-zfs.sh
|
||||
. ../.script-config
|
||||
|
||||
spl_options=$1
|
||||
zpool_options=$2
|
||||
|
||||
spl_module=${SPLBUILD}/modules/spl/spl.ko
|
||||
zlib_module=/lib/modules/${KERNELSRCVER}/kernel/lib/zlib_deflate/zlib_deflate.ko
|
||||
zavl_module=${ZFSBUILD}/lib/libavl/zavl.ko
|
||||
znvpair_module=${ZFSBUILD}/lib/libnvpair/znvpair.ko
|
||||
zport_module=${ZFSBUILD}/lib/libport/zport.ko
|
||||
zcommon_module=${ZFSBUILD}/lib/libzcommon/zcommon.ko
|
||||
zpool_module=${ZFSBUILD}/lib/libzpool/zpool.ko
|
||||
zctl_module=${ZFSBUILD}/lib/libdmu-ctl/zctl.ko
|
||||
zpios_module=${ZFSBUILD}/lib/libzpios/zpios.ko
|
||||
|
||||
die() {
|
||||
echo "${prog}: $1" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
load_module() {
|
||||
echo "Loading $1"
|
||||
/sbin/insmod $* || die "Failed to load $1"
|
||||
}
|
||||
|
||||
if [ $(id -u) != 0 ]; then
|
||||
die "Must run as root"
|
||||
fi
|
||||
|
||||
if /sbin/lsmod | egrep -q "^spl|^zavl|^znvpair|^zport|^zcommon|^zlib_deflate|^zpool"; then
|
||||
die "Must start with modules unloaded"
|
||||
fi
|
||||
|
||||
if [ ! -f ${zavl_module} ] ||
|
||||
[ ! -f ${znvpair_module} ] ||
|
||||
[ ! -f ${zport_module} ] ||
|
||||
[ ! -f ${zcommon_module} ] ||
|
||||
[ ! -f ${zpool_module} ]; then
|
||||
die "Source tree must be built, run 'make'"
|
||||
fi
|
||||
|
||||
load_module ${spl_module} ${spl_options}
|
||||
load_module ${zlib_module}
|
||||
load_module ${zavl_module}
|
||||
load_module ${znvpair_module}
|
||||
load_module ${zport_module}
|
||||
load_module ${zcommon_module}
|
||||
load_module ${zpool_module} ${zpool_options}
|
||||
load_module ${zctl_module}
|
||||
load_module ${zpios_module}
|
||||
|
||||
sleep 1
|
||||
echo "Successfully loaded ZFS module stack"
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,128 @@
|
|||
#!/bin/bash
|
||||
# profile-kpios-disk.sh
|
||||
#
|
||||
# /proc/diskinfo <after skipping major/minor>
|
||||
# Field 1 -- device name
|
||||
# Field 2 -- # of reads issued
|
||||
# Field 3 -- # of reads merged
|
||||
# Field 4 -- # of sectors read
|
||||
# Field 5 -- # of milliseconds spent reading
|
||||
# Field 6 -- # of writes completed
|
||||
# Field 7 -- # of writes merged
|
||||
# Field 8 -- # of sectors written
|
||||
# Field 9 -- # of milliseconds spent writing
|
||||
# Field 10 -- # of I/Os currently in progress
|
||||
# Field 11 -- # of milliseconds spent doing I/Os
|
||||
# Field 12 -- weighted # of milliseconds spent doing I/Os
|
||||
|
||||
RUN_PIDS=${0}
|
||||
RUN_LOG_DIR=${1}
|
||||
RUN_ID=${2}
|
||||
|
||||
create_table() {
|
||||
local FIELD=$1
|
||||
local ROW_M=()
|
||||
local ROW_N=()
|
||||
local HEADER=1
|
||||
local STEP=1
|
||||
|
||||
for DISK_FILE in `ls -r --sort=time --time=ctime ${RUN_LOG_DIR}/${RUN_ID}/disk-[0-9]*`; do
|
||||
ROW_M=( ${ROW_N[@]} )
|
||||
ROW_N=( `cat ${DISK_FILE} | grep sd | cut -c11- | cut -f${FIELD} -d' ' | tr "\n" "\t"` )
|
||||
|
||||
if [ $HEADER -eq 1 ]; then
|
||||
echo -n "step, "
|
||||
cat ${DISK_FILE} | grep sd | cut -c11- | cut -f1 -d' ' | tr "\n" ", "
|
||||
echo "total"
|
||||
HEADER=0
|
||||
fi
|
||||
|
||||
if [ ${#ROW_M[@]} -eq 0 ]; then
|
||||
continue
|
||||
fi
|
||||
|
||||
if [ ${#ROW_M[@]} -ne ${#ROW_N[@]} ]; then
|
||||
echo "Badly formatted profile data in ${DISK_FILE}"
|
||||
break
|
||||
fi
|
||||
|
||||
TOTAL=0
|
||||
echo -n "${STEP}, "
|
||||
for (( i=0; i<${#ROW_N[@]}; i++ )); do
|
||||
DELTA=`echo "${ROW_N[${i}]}-${ROW_M[${i}]}" | bc`
|
||||
let TOTAL=${TOTAL}+${DELTA}
|
||||
echo -n "${DELTA}, "
|
||||
done
|
||||
echo "${TOTAL}, "
|
||||
|
||||
let STEP=${STEP}+1
|
||||
done
|
||||
}
|
||||
|
||||
create_table_mbs() {
|
||||
local FIELD=$1
|
||||
local TIME=$2
|
||||
local ROW_M=()
|
||||
local ROW_N=()
|
||||
local HEADER=1
|
||||
local STEP=1
|
||||
|
||||
for DISK_FILE in `ls -r --sort=time --time=ctime ${RUN_LOG_DIR}/${RUN_ID}/disk-[0-9]*`; do
|
||||
ROW_M=( ${ROW_N[@]} )
|
||||
ROW_N=( `cat ${DISK_FILE} | grep sd | cut -c11- | cut -f${FIELD} -d' ' | tr "\n" "\t"` )
|
||||
|
||||
if [ $HEADER -eq 1 ]; then
|
||||
echo -n "step, "
|
||||
cat ${DISK_FILE} | grep sd | cut -c11- | cut -f1 -d' ' | tr "\n" ", "
|
||||
echo "total"
|
||||
HEADER=0
|
||||
fi
|
||||
|
||||
if [ ${#ROW_M[@]} -eq 0 ]; then
|
||||
continue
|
||||
fi
|
||||
|
||||
if [ ${#ROW_M[@]} -ne ${#ROW_N[@]} ]; then
|
||||
echo "Badly formatted profile data in ${DISK_FILE}"
|
||||
break
|
||||
fi
|
||||
|
||||
TOTAL=0
|
||||
echo -n "${STEP}, "
|
||||
for (( i=0; i<${#ROW_N[@]}; i++ )); do
|
||||
DELTA=`echo "${ROW_N[${i}]}-${ROW_M[${i}]}" | bc`
|
||||
MBS=`echo "scale=2; ((${DELTA}*512)/${TIME})/(1024*1024)" | bc`
|
||||
TOTAL=`echo "scale=2; ${TOTAL}+${MBS}" | bc`
|
||||
echo -n "${MBS}, "
|
||||
done
|
||||
echo "${TOTAL}, "
|
||||
|
||||
let STEP=${STEP}+1
|
||||
done
|
||||
}
|
||||
|
||||
echo
|
||||
echo "Reads issued per device"
|
||||
create_table 2
|
||||
echo
|
||||
echo "Reads merged per device"
|
||||
create_table 3
|
||||
echo
|
||||
echo "Sectors read per device"
|
||||
create_table 4
|
||||
echo "MB/s per device"
|
||||
create_table_mbs 4 3
|
||||
|
||||
echo
|
||||
echo "Writes issued per device"
|
||||
create_table 6
|
||||
echo
|
||||
echo "Writes merged per device"
|
||||
create_table 7
|
||||
echo
|
||||
echo "Sectors written per device"
|
||||
create_table 8
|
||||
echo "MB/s per device"
|
||||
create_table_mbs 8 3
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,130 @@
|
|||
#!/bin/bash
|
||||
# profile-kpios-pids.sh
|
||||
|
||||
RUN_PIDS=${0}
|
||||
RUN_LOG_DIR=${1}
|
||||
RUN_ID=${2}
|
||||
|
||||
ROW_M=()
|
||||
ROW_N=()
|
||||
ROW_N_SCHED=()
|
||||
ROW_N_WAIT=()
|
||||
|
||||
HEADER=1
|
||||
STEP=1
|
||||
|
||||
for PID_FILE in `ls -r --sort=time --time=ctime ${RUN_LOG_DIR}/${RUN_ID}/pids-[0-9]*`; do
|
||||
ROW_M=( ${ROW_N[@]} )
|
||||
ROW_N=( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
|
||||
ROW_N_SCHED=( `cat ${PID_FILE} | cut -f15 -d' ' | tr "\n" "\t"` )
|
||||
ROW_N_WAIT=( `cat ${PID_FILE} | cut -f17 -d' ' | tr "\n" "\t"` )
|
||||
ROW_N_NAMES=( `cat ${PID_FILE} | cut -f2 -d' ' | cut -f2 -d'(' |
|
||||
cut -f1 -d')' | cut -f1 -d'/' | tr "\n" "\t"` )
|
||||
|
||||
for (( i=0; i<${#ROW_N_SCHED[@]}; i++ )); do
|
||||
SUM=`echo "${ROW_N_WAIT[${i}]}+${ROW_N_SCHED[${i}]}" | bc`
|
||||
|
||||
case ${ROW_N_NAMES[${i}]} in
|
||||
zio_taskq) IDX=0;;
|
||||
zio_req_nul) IDX=1;;
|
||||
zio_irq_nul) IDX=2;;
|
||||
zio_req_rd) IDX=3;;
|
||||
zio_irq_rd) IDX=4;;
|
||||
zio_req_wr) IDX=5;;
|
||||
zio_irq_wr) IDX=6;;
|
||||
zio_req_fr) IDX=7;;
|
||||
zio_irq_fr) IDX=8;;
|
||||
zio_req_cm) IDX=9;;
|
||||
zio_irq_cm) IDX=10;;
|
||||
zio_req_ctl) IDX=11;;
|
||||
zio_irq_ctl) IDX=12;;
|
||||
txg_quiesce) IDX=13;;
|
||||
txg_sync) IDX=14;;
|
||||
txg_timelimit) IDX=15;;
|
||||
arc_reclaim) IDX=16;;
|
||||
l2arc_feed) IDX=17;;
|
||||
kpios_io) IDX=18;;
|
||||
*) continue;;
|
||||
esac
|
||||
|
||||
let ROW_N[${IDX}]=${ROW_N[${IDX}]}+${SUM}
|
||||
done
|
||||
|
||||
if [ $HEADER -eq 1 ]; then
|
||||
echo "step, zio_taskq, zio_req_nul, zio_irq_nul, " \
|
||||
"zio_req_rd, zio_irq_rd, zio_req_wr, zio_irq_wr, " \
|
||||
"zio_req_fr, zio_irq_fr, zio_req_cm, zio_irq_cm, " \
|
||||
"zio_req_ctl, zio_irq_ctl, txg_quiesce, txg_sync, " \
|
||||
"txg_timelimit, arc_reclaim, l2arc_feed, kpios_io, " \
|
||||
"idle"
|
||||
HEADER=0
|
||||
fi
|
||||
|
||||
if [ ${#ROW_M[@]} -eq 0 ]; then
|
||||
continue
|
||||
fi
|
||||
|
||||
if [ ${#ROW_M[@]} -ne ${#ROW_N[@]} ]; then
|
||||
echo "Badly formatted profile data in ${PID_FILE}"
|
||||
break
|
||||
fi
|
||||
|
||||
# Original values are in jiffies and we expect HZ to be 1000
|
||||
# on most 2.6 systems thus we divide by 10 to get a percentage.
|
||||
IDLE=1000
|
||||
echo -n "${STEP}, "
|
||||
for (( i=0; i<${#ROW_N[@]}; i++ )); do
|
||||
DELTA=`echo "${ROW_N[${i}]}-${ROW_M[${i}]}" | bc`
|
||||
DELTA_PERCENT=`echo "scale=1; ${DELTA}/10" | bc`
|
||||
let IDLE=${IDLE}-${DELTA}
|
||||
echo -n "${DELTA_PERCENT}, "
|
||||
done
|
||||
ILDE_PERCENT=`echo "scale=1; ${IDLE}/10" | bc`
|
||||
echo "${ILDE_PERCENT}"
|
||||
|
||||
let STEP=${STEP}+1
|
||||
done
|
||||
|
||||
exit
|
||||
|
||||
echo
|
||||
echo "Percent of total system time per pid"
|
||||
for PID_FILE in `ls -r --sort=time --time=ctime ${RUN_LOG_DIR}/${RUN_ID}/pids-[0-9]*`; do
|
||||
ROW_M=( ${ROW_N[@]} )
|
||||
ROW_N_SCHED=( `cat ${PID_FILE} | cut -f15 -d' ' | tr "\n" "\t"` )
|
||||
ROW_N_WAIT=( `cat ${PID_FILE} | cut -f17 -d' ' | tr "\n" "\t"` )
|
||||
|
||||
for (( i=0; i<${#ROW_N_SCHED[@]}; i++ )); do
|
||||
ROW_N[${i}]=`echo "${ROW_N_WAIT[${i}]}+${ROW_N_SCHED[${i}]}" | bc`
|
||||
done
|
||||
|
||||
if [ $HEADER -eq 1 ]; then
|
||||
echo -n "step, "
|
||||
cat ${PID_FILE} | cut -f2 -d' ' | tr "\n" ", "
|
||||
echo
|
||||
HEADER=0
|
||||
fi
|
||||
|
||||
if [ ${#ROW_M[@]} -eq 0 ]; then
|
||||
continue
|
||||
fi
|
||||
|
||||
if [ ${#ROW_M[@]} -ne ${#ROW_N[@]} ]; then
|
||||
echo "Badly formatted profile data in ${PID_FILE}"
|
||||
break
|
||||
fi
|
||||
|
||||
# Original values are in jiffies and we expect HZ to be 1000
|
||||
# on most 2.6 systems thus we divide by 10 to get a percentage.
|
||||
echo -n "${STEP}, "
|
||||
for (( i=0; i<${#ROW_N[@]}; i++ )); do
|
||||
DELTA=`echo "scale=1; (${ROW_N[${i}]}-${ROW_M[${i}]})/10" | bc`
|
||||
echo -n "${DELTA}, "
|
||||
done
|
||||
|
||||
echo
|
||||
let STEP=${STEP}+1
|
||||
done
|
||||
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,67 @@
|
|||
#!/bin/bash
|
||||
|
||||
prog=profile-kpios-post.sh
|
||||
. ../.script-config
|
||||
|
||||
RUN_POST=${0}
|
||||
RUN_PHASE=${1}
|
||||
RUN_LOG_DIR=${2}
|
||||
RUN_ID=${3}
|
||||
RUN_POOL=${4}
|
||||
RUN_CHUNK_SIZE=${5}
|
||||
RUN_REGION_SIZE=${6}
|
||||
RUN_THREAD_COUNT=${7}
|
||||
RUN_REGION_COUNT=${8}
|
||||
RUN_OFFSET=${9}
|
||||
RUN_REGION_NOISE=${10}
|
||||
RUN_CHUNK_NOISE=${11}
|
||||
RUN_THREAD_DELAY=${12}
|
||||
RUN_FLAGS=${13}
|
||||
RUN_RESULT=${14}
|
||||
|
||||
PROFILE_KPIOS_PIDS_BIN=/home/behlendo/src/zfs/scripts/profile-kpios-pids.sh
|
||||
PROFILE_KPIOS_PIDS_LOG=${RUN_LOG_DIR}/${RUN_ID}/pids-summary.csv
|
||||
|
||||
PROFILE_KPIOS_DISK_BIN=/home/behlendo/src/zfs/scripts/profile-kpios-disk.sh
|
||||
PROFILE_KPIOS_DISK_LOG=${RUN_LOG_DIR}/${RUN_ID}/disk-summary.csv
|
||||
|
||||
PROFILE_KPIOS_ARC_LOG=${RUN_LOG_DIR}/${RUN_ID}/arcstats
|
||||
PROFILE_KPIOS_VDEV_LOG=${RUN_LOG_DIR}/${RUN_ID}/vdev_cache_stats
|
||||
|
||||
KERNEL_BIN="/lib/modules/`uname -r`/kernel/"
|
||||
SPL_BIN="${SPLBUILD}/modules/spl/"
|
||||
ZFS_BIN="${ZFSBUILD}/lib/"
|
||||
|
||||
OPROFILE_SHORT_ARGS="-a -g -l -p ${KERNEL_BIN},${SPL_BIN},${ZFS_BIN}"
|
||||
OPROFILE_LONG_ARGS="-d -a -g -l -p ${KERNEL_BIN},${SPL_BIN},${ZFS_BIN}"
|
||||
|
||||
OPROFILE_LOG=${RUN_LOG_DIR}/${RUN_ID}/oprofile.txt
|
||||
OPROFILE_SHORT_LOG=${RUN_LOG_DIR}/${RUN_ID}/oprofile-short.txt
|
||||
OPROFILE_LONG_LOG=${RUN_LOG_DIR}/${RUN_ID}/oprofile-long.txt
|
||||
PROFILE_PID=${RUN_LOG_DIR}/${RUN_ID}/pid
|
||||
|
||||
if [ "${RUN_PHASE}" != "post" ]; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# opcontrol --stop >>${OPROFILE_LOG} 2>&1
|
||||
# opcontrol --dump >>${OPROFILE_LOG} 2>&1
|
||||
|
||||
kill -s SIGHUP `cat ${PROFILE_PID}`
|
||||
rm -f ${PROFILE_PID}
|
||||
|
||||
# opreport ${OPROFILE_SHORT_ARGS} >${OPROFILE_SHORT_LOG} 2>&1
|
||||
# opreport ${OPROFILE_LONG_ARGS} >${OPROFILE_LONG_LOG} 2>&1
|
||||
|
||||
# opcontrol --deinit >>${OPROFILE_LOG} 2>&1
|
||||
|
||||
cat /proc/spl/kstat/zfs/arcstats >${PROFILE_KPIOS_ARC_LOG}
|
||||
cat /proc/spl/kstat/zfs/vdev_cache_stats >${PROFILE_KPIOS_VDEV_LOG}
|
||||
|
||||
# Summarize system time per pid
|
||||
${PROFILE_KPIOS_PIDS_BIN} ${RUN_LOG_DIR} ${RUN_ID} >${PROFILE_KPIOS_PIDS_LOG}
|
||||
|
||||
# Summarize per device performance
|
||||
${PROFILE_KPIOS_DISK_BIN} ${RUN_LOG_DIR} ${RUN_ID} >${PROFILE_KPIOS_DISK_LOG}
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,69 @@
|
|||
#!/bin/bash
|
||||
# profile-kpios-pre.sh
|
||||
|
||||
trap "PROFILE_KPIOS_READY=1" SIGHUP
|
||||
|
||||
RUN_PRE=${0}
|
||||
RUN_PHASE=${1}
|
||||
RUN_LOG_DIR=${2}
|
||||
RUN_ID=${3}
|
||||
RUN_POOL=${4}
|
||||
RUN_CHUNK_SIZE=${5}
|
||||
RUN_REGION_SIZE=${6}
|
||||
RUN_THREAD_COUNT=${7}
|
||||
RUN_REGION_COUNT=${8}
|
||||
RUN_OFFSET=${9}
|
||||
RUN_REGION_NOISE=${10}
|
||||
RUN_CHUNK_NOISE=${11}
|
||||
RUN_THREAD_DELAY=${12}
|
||||
RUN_FLAGS=${13}
|
||||
RUN_RESULT=${14}
|
||||
|
||||
PROFILE_KPIOS_BIN=/home/behlendo/src/zfs/scripts/profile-kpios.sh
|
||||
PROFILE_KPIOS_READY=0
|
||||
|
||||
OPROFILE_LOG=${RUN_LOG_DIR}/${RUN_ID}/oprofile.txt
|
||||
PROFILE_PID=${RUN_LOG_DIR}/${RUN_ID}/pid
|
||||
RUN_ARGS=${RUN_LOG_DIR}/${RUN_ID}/args
|
||||
|
||||
if [ "${RUN_PHASE}" != "pre" ]; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
rm -Rf ${RUN_LOG_DIR}/${RUN_ID}/
|
||||
mkdir -p ${RUN_LOG_DIR}/${RUN_ID}/
|
||||
|
||||
echo "PHASE=${RUN_PHASE}" >>${RUN_ARGS}
|
||||
echo "LOG_DIR=${RUN_LOG_DIR}" >>${RUN_ARGS}
|
||||
echo "ID=${RUN_ID}" >>${RUN_ARGS}
|
||||
echo "POOL=${RUN_POOL}" >>${RUN_ARGS}
|
||||
echo "CHUNK_SIZE=${RUN_CHUNK_SIZE}" >>${RUN_ARGS}
|
||||
echo "REGION_SIZE=${RUN_REGION_SIZE}" >>${RUN_ARGS}
|
||||
echo "THREAD_COUNT=${RUN_THREAD_COUNT}" >>${RUN_ARGS}
|
||||
echo "REGION_COUNT=${RUN_REGION_COUNT}" >>${RUN_ARGS}
|
||||
echo "OFFSET=${RUN_OFFSET}" >>${RUN_ARGS}
|
||||
echo "REGION_NOISE=${RUN_REGION_NOISE}" >>${RUN_ARGS}
|
||||
echo "CHUNK_NOISE=${RUN_CHUNK_NOISE}" >>${RUN_ARGS}
|
||||
echo "THREAD_DELAY=${RUN_THREAD_DELAY}" >>${RUN_ARGS}
|
||||
echo "FLAGS=${RUN_FLAGS}" >>${RUN_ARGS}
|
||||
echo "RESULT=${RUN_RESULT}" >>${RUN_ARGS}
|
||||
|
||||
# XXX: Oprofile support seems to be broken when I try and start
|
||||
# it via a user mode helper script, I suspect the setup is failing.
|
||||
# opcontrol --init >>${OPROFILE_LOG} 2>&1
|
||||
# opcontrol --setup --vmlinux=/boot/vmlinux >>${OPROFILE_LOG} 2>&1
|
||||
|
||||
# Start the profile script
|
||||
${PROFILE_KPIOS_BIN} ${RUN_PHASE} ${RUN_LOG_DIR} ${RUN_ID} &
|
||||
echo "$!" >${PROFILE_PID}
|
||||
|
||||
# Sleep waiting for profile script to be ready, it will
|
||||
# signal us via SIGHUP when it is ready to start profiling.
|
||||
while [ ${PROFILE_KPIOS_READY} -eq 0 ]; do
|
||||
sleep 0.1
|
||||
done
|
||||
|
||||
# opcontrol --start-daemon >>${OPROFILE_LOG} 2>&1
|
||||
# opcontrol --start >>${OPROFILE_LOG} 2>&1
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,222 @@
|
|||
#!/bin/bash
|
||||
# profile-kpios.sh
|
||||
|
||||
trap "RUN_DONE=1" SIGHUP
|
||||
|
||||
RUN_PHASE=${1}
|
||||
RUN_LOG_DIR=${2}
|
||||
RUN_ID=${3}
|
||||
RUN_DONE=0
|
||||
|
||||
POLL_INTERVAL=2.99
|
||||
|
||||
# Log these pids, the exact pid numbers will vary from system to system
|
||||
# so I harvest pid for all the following type of processes from /proc/<pid>/
|
||||
#
|
||||
# zio_taskq/#
|
||||
# spa_zio_issue/#
|
||||
# spa_zio_intr/#
|
||||
# txg_quiesce_thr
|
||||
# txg_sync_thread
|
||||
# txg_timelimit_t
|
||||
# arc_reclaim_thr
|
||||
# l2arc_feed_thre
|
||||
# kpios_io/#
|
||||
|
||||
ZIO_TASKQ_PIDS=()
|
||||
ZIO_REQ_NUL_PIDS=()
|
||||
ZIO_IRQ_NUL_PIDS=()
|
||||
ZIO_REQ_RD_PIDS=()
|
||||
ZIO_IRQ_RD_PIDS=()
|
||||
ZIO_REQ_WR_PIDS=()
|
||||
ZIO_IRQ_WR_PIDS=()
|
||||
ZIO_REQ_FR_PIDS=()
|
||||
ZIO_IRQ_FR_PIDS=()
|
||||
ZIO_REQ_CM_PIDS=()
|
||||
ZIO_IRQ_CM_PIDS=()
|
||||
ZIO_REQ_CTL_PIDS=()
|
||||
ZIO_IRQ_CTL_PIDS=()
|
||||
|
||||
TXG_QUIESCE_PIDS=()
|
||||
TXG_SYNC_PIDS=()
|
||||
TXG_TIMELIMIT_PIDS=()
|
||||
|
||||
ARC_RECLAIM_PIDS=()
|
||||
L2ARC_FEED_PIDS=()
|
||||
|
||||
KPIOS_IO_PIDS=()
|
||||
|
||||
show_pids() {
|
||||
echo "* zio_taskq: { ${ZIO_TASKQ_PIDS[@]} } = ${#ZIO_TASKQ_PIDS[@]}"
|
||||
echo "* zio_req_nul: { ${ZIO_REQ_NUL_PIDS[@]} } = ${#ZIO_REQ_NUL_PIDS[@]}"
|
||||
echo "* zio_irq_nul: { ${ZIO_IRQ_NUL_PIDS[@]} } = ${#ZIO_IRQ_NUL_PIDS[@]}"
|
||||
echo "* zio_req_rd: { ${ZIO_REQ_RD_PIDS[@]} } = ${#ZIO_REQ_RD_PIDS[@]}"
|
||||
echo "* zio_irq_rd: { ${ZIO_IRQ_RD_PIDS[@]} } = ${#ZIO_IRQ_RD_PIDS[@]}"
|
||||
echo "* zio_req_wr: { ${ZIO_REQ_WR_PIDS[@]} } = ${#ZIO_REQ_WR_PIDS[@]}"
|
||||
echo "* zio_irq_wr: { ${ZIO_IRQ_WR_PIDS[@]} } = ${#ZIO_IRQ_WR_PIDS[@]}"
|
||||
echo "* zio_req_fr: { ${ZIO_REQ_FR_PIDS[@]} } = ${#ZIO_REQ_FR_PIDS[@]}"
|
||||
echo "* zio_irq_fr: { ${ZIO_IRQ_FR_PIDS[@]} } = ${#ZIO_IRQ_FR_PIDS[@]}"
|
||||
echo "* zio_req_cm: { ${ZIO_REQ_CM_PIDS[@]} } = ${#ZIO_REQ_CM_PIDS[@]}"
|
||||
echo "* zio_irq_cm: { ${ZIO_IRQ_CM_PIDS[@]} } = ${#ZIO_IRQ_CM_PIDS[@]}"
|
||||
echo "* zio_req_ctl: { ${ZIO_REQ_CTL_PIDS[@]} } = ${#ZIO_REQ_CTL_PIDS[@]}"
|
||||
echo "* zio_irq_ctl: { ${ZIO_IRQ_CTL_PIDS[@]} } = ${#ZIO_IRQ_CTL_PIDS[@]}"
|
||||
echo "* txg_quiesce: { ${TXG_QUIESCE_PIDS[@]} } = ${#TXG_QUIESCE_PIDS[@]}"
|
||||
echo "* txg_sync: { ${TXG_SYNC_PIDS[@]} } = ${#TXG_SYNC_PIDS[@]}"
|
||||
echo "* txg_timelimit: { ${TXG_TIMELIMIT_PIDS[@]} } = ${#TXG_TIMELIMIT_PIDS[@]}"
|
||||
echo "* arc_reclaim: { ${ARC_RECLAIM_PIDS[@]} } = ${#ARC_RECLAIM_PIDS[@]}"
|
||||
echo "* l2arc_feed: { ${L2ARC_FEED_PIDS[@]} } = ${#L2ARC_FEED_PIDS[@]}"
|
||||
echo "* kpios_io: { ${KPIOS_IO_PIDS[@]} } = ${#KPIOS_IO_PIDS[@]}"
|
||||
}
|
||||
|
||||
check_pid() {
|
||||
local PID=$1
|
||||
local NAME=$2
|
||||
local TYPE=$3
|
||||
local PIDS=( "$4" )
|
||||
local NAME_STRING=`echo ${NAME} | cut -f1 -d'/'`
|
||||
local NAME_NUMBER=`echo ${NAME} | cut -f2 -d'/'`
|
||||
|
||||
if [ "${NAME_STRING}" == "${TYPE}" ]; then
|
||||
if [ -n "${NAME_NUMBER}" ]; then
|
||||
PIDS[${NAME_NUMBER}]=${PID}
|
||||
else
|
||||
PIDS[${#PIDS[@]}]=${PID}
|
||||
|
||||
fi
|
||||
fi
|
||||
|
||||
echo "${PIDS[@]}"
|
||||
}
|
||||
|
||||
# NOTE: This whole process is crazy slow but it will do for now
|
||||
aquire_pids() {
|
||||
echo "--- Aquiring ZFS pids ---"
|
||||
|
||||
for PID in `ls /proc/ | grep [0-9] | sort -n -u`; do
|
||||
if [ ! -e /proc/${PID}/status ]; then
|
||||
continue
|
||||
fi
|
||||
|
||||
NAME=`cat /proc/${PID}/status | head -n1 | cut -f2`
|
||||
|
||||
ZIO_TASKQ_PIDS=( `check_pid ${PID} ${NAME} "zio_taskq" \
|
||||
"$(echo "${ZIO_TASKQ_PIDS[@]}")"` )
|
||||
|
||||
ZIO_REQ_NUL_PIDS=( `check_pid ${PID} ${NAME} "zio_req_nul" \
|
||||
"$(echo "${ZIO_REQ_NUL_PIDS[@]}")"` )
|
||||
|
||||
ZIO_IRQ_NUL_PIDS=( `check_pid ${PID} ${NAME} "zio_irq_nul" \
|
||||
"$(echo "${ZIO_IRQ_NUL_PIDS[@]}")"` )
|
||||
|
||||
ZIO_REQ_RD_PIDS=( `check_pid ${PID} ${NAME} "zio_req_rd" \
|
||||
"$(echo "${ZIO_REQ_RD_PIDS[@]}")"` )
|
||||
|
||||
ZIO_IRQ_RD_PIDS=( `check_pid ${PID} ${NAME} "zio_irq_rd" \
|
||||
"$(echo "${ZIO_IRQ_RD_PIDS[@]}")"` )
|
||||
|
||||
ZIO_REQ_WR_PIDS=( `check_pid ${PID} ${NAME} "zio_req_wr" \
|
||||
"$(echo "${ZIO_REQ_WR_PIDS[@]}")"` )
|
||||
|
||||
ZIO_IRQ_WR_PIDS=( `check_pid ${PID} ${NAME} "zio_irq_wr" \
|
||||
"$(echo "${ZIO_IRQ_WR_PIDS[@]}")"` )
|
||||
|
||||
ZIO_REQ_FR_PIDS=( `check_pid ${PID} ${NAME} "zio_req_fr" \
|
||||
"$(echo "${ZIO_REQ_FR_PIDS[@]}")"` )
|
||||
|
||||
ZIO_IRQ_FR_PIDS=( `check_pid ${PID} ${NAME} "zio_irq_fr" \
|
||||
"$(echo "${ZIO_IRQ_FR_PIDS[@]}")"` )
|
||||
|
||||
ZIO_REQ_CM_PIDS=( `check_pid ${PID} ${NAME} "zio_req_cm" \
|
||||
"$(echo "${ZIO_REQ_CM_PIDS[@]}")"` )
|
||||
|
||||
ZIO_IRQ_CM_PIDS=( `check_pid ${PID} ${NAME} "zio_irq_cm" \
|
||||
"$(echo "${ZIO_IRQ_CM_PIDS[@]}")"` )
|
||||
|
||||
ZIO_REQ_CTL_PIDS=( `check_pid ${PID} ${NAME} "zio_req_ctl" \
|
||||
"$(echo "${ZIO_REQ_CTL_PIDS[@]}")"` )
|
||||
|
||||
ZIO_IRQ_CTL_PIDS=( `check_pid ${PID} ${NAME} "zio_irq_ctl" \
|
||||
"$(echo "${ZIO_IRQ_CTL_PIDS[@]}")"` )
|
||||
|
||||
TXG_QUIESCE_PIDS=( `check_pid ${PID} ${NAME} "txg_quiesce" \
|
||||
"$(echo "${TXG_QUIESCE_PIDS[@]}")"` )
|
||||
|
||||
TXG_SYNC_PIDS=( `check_pid ${PID} ${NAME} "txg_sync" \
|
||||
"$(echo "${TXG_SYNC_PIDS[@]}")"` )
|
||||
|
||||
TXG_TIMELIMIT_PIDS=( `check_pid ${PID} ${NAME} "txg_timelimit" \
|
||||
"$(echo "${TXG_TIMELIMIT_PIDS[@]}")"` )
|
||||
|
||||
ARC_RECLAIM_PIDS=( `check_pid ${PID} ${NAME} "arc_reclaim" \
|
||||
"$(echo "${ARC_RECLAIM_PIDS[@]}")"` )
|
||||
|
||||
L2ARC_FEED_PIDS=( `check_pid ${PID} ${NAME} "l2arc_feed" \
|
||||
"$(echo "${L2ARC_FEED_PIDS[@]}")"` )
|
||||
done
|
||||
|
||||
# Wait for kpios_io threads to start
|
||||
kill -s SIGHUP ${PPID}
|
||||
echo "* Waiting for kpios_io threads to start"
|
||||
while [ ${RUN_DONE} -eq 0 ]; do
|
||||
KPIOS_IO_PIDS=( `ps ax | grep kpios_io | grep -v grep | \
|
||||
sed 's/^ *//g' | cut -f1 -d' '` )
|
||||
if [ ${#KPIOS_IO_PIDS[@]} -gt 0 ]; then
|
||||
break;
|
||||
fi
|
||||
sleep 0.1
|
||||
done
|
||||
|
||||
echo "`show_pids`" >${RUN_LOG_DIR}/${RUN_ID}/pids.txt
|
||||
}
|
||||
|
||||
log_pids() {
|
||||
echo "--- Logging ZFS profile to ${RUN_LOG_DIR}/${RUN_ID}/ ---"
|
||||
ALL_PIDS=( ${ZIO_TASKQ_PIDS[@]} \
|
||||
${ZIO_REQ_NUL_PIDS[@]} \
|
||||
${ZIO_IRQ_NUL_PIDS[@]} \
|
||||
${ZIO_REQ_RD_PID[@]} \
|
||||
${ZIO_IRQ_RD_PIDS[@]} \
|
||||
${ZIO_REQ_WR_PIDS[@]} \
|
||||
${ZIO_IRQ_WR_PIDS[@]} \
|
||||
${ZIO_REQ_FR_PIDS[@]} \
|
||||
${ZIO_IRQ_FR_PIDS[@]} \
|
||||
${ZIO_REQ_CM_PIDS[@]} \
|
||||
${ZIO_IRQ_CM_PIDS[@]} \
|
||||
${ZIO_REQ_CTL_PIDS[@]} \
|
||||
${ZIO_IRQ_CTL_PIDS[@]} \
|
||||
${TXG_QUIESCE_PIDS[@]} \
|
||||
${TXG_SYNC_PIDS[@]} \
|
||||
${TXG_TIMELIMIT_PIDS[@]} \
|
||||
${ARC_RECLAIM_PIDS[@]} \
|
||||
${L2ARC_FEED_PIDS[@]} \
|
||||
${KPIOS_IO_PIDS[@]} )
|
||||
|
||||
while [ ${RUN_DONE} -eq 0 ]; do
|
||||
NOW=`date +%s.%N`
|
||||
LOG_PIDS="${RUN_LOG_DIR}/${RUN_ID}/pids-${NOW}"
|
||||
LOG_DISK="${RUN_LOG_DIR}/${RUN_ID}/disk-${NOW}"
|
||||
|
||||
for PID in "${ALL_PIDS[@]}"; do
|
||||
if [ -z ${PID} ]; then
|
||||
continue;
|
||||
fi
|
||||
|
||||
if [ -e /proc/${PID}/stat ]; then
|
||||
cat /proc/${PID}/stat | head -n1 >>${LOG_PIDS}
|
||||
else
|
||||
echo "<${PID} exited>" >>${LOG_PIDS}
|
||||
fi
|
||||
done
|
||||
|
||||
cat /proc/diskstats >${LOG_DISK}
|
||||
|
||||
NOW2=`date +%s.%N`
|
||||
DELTA=`echo "${POLL_INTERVAL}-(${NOW2}-${NOW})" | bc`
|
||||
sleep ${DELTA}
|
||||
done
|
||||
}
|
||||
|
||||
aquire_pids
|
||||
log_pids
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,102 @@
|
|||
#!/bin/bash
|
||||
|
||||
prog=survey.sh
|
||||
. ../.script-config
|
||||
|
||||
LOG=/home/`whoami`/zpios-logs/`uname -r`/kpios-`date +%Y%m%d`/
|
||||
mkdir -p ${LOG}
|
||||
|
||||
# Apply all tunings described below to generate some best case
|
||||
# numbers for what is acheivable with some more elbow grease.
|
||||
NAME="prefetch+zerocopy+checksum+pending1024+kmem"
|
||||
echo "----------------------- ${NAME} ------------------------------"
|
||||
./zpios.sh \
|
||||
"" \
|
||||
"zfs_prefetch_disable=1 zfs_vdev_max_pending=1024 zio_bulk_flags=0x100" \
|
||||
"--zerocopy" \
|
||||
${LOG}/${NAME}/ \
|
||||
"${CMDDIR}/zfs/zfs set checksum=off lustre" | \
|
||||
tee ${LOG}/${NAME}.txt
|
||||
|
||||
# Baseline number for an out of the box config with no manual tuning.
|
||||
# Ideally, we will want things to be automatically tuned and for this
|
||||
# number to approach the tweaked out results above.
|
||||
NAME="baseline"
|
||||
echo "----------------------- ${NAME} ------------------------------"
|
||||
./zpios.sh \
|
||||
"" \
|
||||
"" \
|
||||
"" \
|
||||
${LOG}/${NAME}/ | \
|
||||
tee ${LOG}/${NAME}.txt
|
||||
|
||||
# Disable ZFS's prefetching. For some reason still not clear to me
|
||||
# current prefetching policy is quite bad for a random workload.
|
||||
# Allow the algorithm to detect a random workload and not do anything
|
||||
# may be the way to address this issue.
|
||||
NAME="prefetch"
|
||||
echo "----------------------- ${NAME} ------------------------------"
|
||||
./zpios.sh \
|
||||
"" \
|
||||
"zfs_prefetch_disable=1" \
|
||||
"" \
|
||||
${LOG}/${NAME}/ | \
|
||||
tee ${LOG}/${NAME}.txt
|
||||
|
||||
# As expected, simulating a zerocopy IO path improves performance
|
||||
# by freeing up lots of CPU which is wasted move data between buffers.
|
||||
NAME="zerocopy"
|
||||
echo "----------------------- ${NAME} ------------------------------"
|
||||
./zpios.sh \
|
||||
"" \
|
||||
"" \
|
||||
"--zerocopy" \
|
||||
${LOG}/${NAME}/ | \
|
||||
tee ${LOG}/${NAME}.txt
|
||||
|
||||
# Disabling checksumming should show some (if small) improvement
|
||||
# simply due to freeing up a modest amount of CPU.
|
||||
NAME="checksum"
|
||||
echo "----------------------- ${NAME} ------------------------------"
|
||||
./zpios.sh \
|
||||
"" \
|
||||
"" \
|
||||
"" \
|
||||
${LOG}/${NAME}/ \
|
||||
"${CMDDIR}/zfs/zfs set checksum=off lustre" | \
|
||||
tee ${LOG}/${NAME}.txt
|
||||
|
||||
# Increasing the pending IO depth also seems to improve things likely
|
||||
# at the expense of latency. This should be exported more because I'm
|
||||
# seeing a much bigger impact there that I would have expected. There
|
||||
# may be some low hanging fruit to be found here.
|
||||
NAME="pending"
|
||||
echo "----------------------- ${NAME} ------------------------------"
|
||||
./zpios.sh \
|
||||
"" \
|
||||
"zfs_vdev_max_pending=1024" \
|
||||
"" \
|
||||
${LOG}/${NAME}/ | \
|
||||
tee ${LOG}/${NAME}.txt
|
||||
|
||||
# To avoid memory fragmentation issues our slab implementation can be
|
||||
# based on a virtual address space. Interestingly, we take a pretty
|
||||
# substantial performance penalty for this somewhere in the low level
|
||||
# IO drivers. If we back the slab with kmem pages we see far better
|
||||
# read performance numbers at the cost of memory fragmention and general
|
||||
# system instability due to large allocations. This may be because of
|
||||
# an optimization in the low level drivers due to the contigeous kmem
|
||||
# based memory. This needs to be explained. The good news here is that
|
||||
# with zerocopy interfaces added at the DMU layer we could gaurentee
|
||||
# kmem based memory for a pool of pages.
|
||||
#
|
||||
# 0x100 = KMC_KMEM - Force kmem_* based slab
|
||||
# 0x200 = KMC_VMEM - Force vmem_* based slab
|
||||
NAME="kmem"
|
||||
echo "----------------------- ${NAME} ------------------------------"
|
||||
./zpios.sh \
|
||||
"" \
|
||||
"zio_bulk_flags=0x100" \
|
||||
"" \
|
||||
${LOG}/${NAME}/ | \
|
||||
tee ${LOG}/${NAME}.txt
|
|
@ -0,0 +1,55 @@
|
|||
#!/bin/bash
|
||||
|
||||
prog=unload-zfs.sh
|
||||
. ../.script-config
|
||||
|
||||
spl_module=${SPLBUILD}/modules/spl/spl.ko
|
||||
zlib_module=/lib/modules/${KERNELSRCVER}/kernel/lib/zlib_deflate/zlib_deflate.ko
|
||||
zavl_module=${ZFSBUILD}/lib/libavl/zavl.ko
|
||||
znvpair_module=${ZFSBUILD}/lib/libnvpair/znvpair.ko
|
||||
zport_module=${ZFSBUILD}/lib/libport/zport.ko
|
||||
zcommon_module=${ZFSBUILD}/lib/libzcommon/zcommon.ko
|
||||
zpool_module=${ZFSBUILD}/lib/libzpool/zpool.ko
|
||||
zctl_module=${ZFSBUILD}/lib/libdmu-ctl/zctl.ko
|
||||
zpios_module=${ZFSBUILD}/lib/libzpios/zpios.ko
|
||||
|
||||
die() {
|
||||
echo "${prog}: $1" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
unload_module() {
|
||||
echo "Unloading $1"
|
||||
/sbin/rmmod $1 || die "Failed to unload $1"
|
||||
}
|
||||
|
||||
if [ $(id -u) != 0 ]; then
|
||||
die "Must run as root"
|
||||
fi
|
||||
|
||||
unload_module ${zpios_module}
|
||||
unload_module ${zctl_module}
|
||||
unload_module ${zpool_module}
|
||||
unload_module ${zcommon_module}
|
||||
unload_module ${zport_module}
|
||||
unload_module ${znvpair_module}
|
||||
unload_module ${zavl_module}
|
||||
unload_module ${zlib_module}
|
||||
|
||||
# Set DUMP=1 to generate debug logs on unload
|
||||
if [ -n "${DUMP}" ]; then
|
||||
sysctl -w kernel.spl.debug.dump=1
|
||||
# This is racy, I don't like it, but for a helper script it will do.
|
||||
SPL_LOG=`dmesg | tail -n 1 | cut -f5 -d' '`
|
||||
${SPLBUILD}/cmd/spl ${SPL_LOG} >${SPL_LOG}.log
|
||||
echo
|
||||
echo "Dumped debug log: ${SPL_LOG}.log"
|
||||
tail -n1 ${SPL_LOG}.log
|
||||
echo
|
||||
fi
|
||||
|
||||
unload_module ${spl_module}
|
||||
|
||||
echo "Successfully unloaded ZFS module stack"
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,59 @@
|
|||
#!/bin/bash
|
||||
|
||||
PROG=update-zfs.sh
|
||||
ZFS_SRC=http://dlc.sun.com/osol/on/downloads/b89/on-src.tar.bz2
|
||||
|
||||
die() {
|
||||
rm -Rf $SRC
|
||||
echo "${PROG}: $1" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
DEST=`pwd`
|
||||
if [ `basename $DEST` != "scripts" ]; then
|
||||
die "Must be run from scripts directory"
|
||||
fi
|
||||
|
||||
SRC=`mktemp -d /tmp/zfs.XXXXXXXXXX`
|
||||
DEST=`dirname $DEST`
|
||||
DATE=`date +%Y%m%d%H%M%S`
|
||||
|
||||
wget $ZFS_SRC
|
||||
|
||||
echo "--- Updating ZFS source ---"
|
||||
echo
|
||||
echo "ZFS_REPO = $ZFS_REPO"
|
||||
echo "ZFS_PATCH_REPO = $ZFS_PATCH_REPO"
|
||||
echo "SRC = $SRC"
|
||||
echo "DEST = $DEST"
|
||||
|
||||
echo
|
||||
echo "--- Cloning $ZFS_REPO ---"
|
||||
cd $SRC || die "Failed to 'cd $SRC'"
|
||||
hg clone $ZFS_REPO || die "Failed to clone $ZFS_REPO"
|
||||
|
||||
echo
|
||||
echo "--- Cloning $ZFS_PATCH_REPO ---"
|
||||
hg clone $ZFS_PATCH_REPO patches || die "Failed to clone $ZFS_PATCH_REPO"
|
||||
|
||||
echo
|
||||
echo "--- Backing up existing files ---"
|
||||
echo "$DEST/zfs -> $DEST/zfs.$DATE"
|
||||
cp -Rf $DEST/zfs $DEST/zfs.$DATE || die "Failed to backup"
|
||||
echo "$DEST/zfs_patches -> $DEST/zfs_patches.$DATE"
|
||||
cp -Rf $DEST/zfs_patches $DEST/zfs_patches.$DATE || die "Failed to backup"
|
||||
|
||||
echo
|
||||
echo "--- Overwriting $DEST/zfs and $DEST/zfs_patches ---"
|
||||
find $SRC/trunk/src/ -name SConstruct -type f -print | xargs /bin/rm -f
|
||||
find $SRC/trunk/src/ -name SConscript -type f -print | xargs /bin/rm -f
|
||||
find $SRC/trunk/src/ -name *.orig -type f -print | xargs /bin/rm -f
|
||||
rm -f $SRC/trunk/src/myconfig.py
|
||||
cp -Rf $SRC/trunk/src/* $DEST/zfs || die "Failed to overwrite"
|
||||
cp -Rf $SRC/patches/*.patch $DEST/zfs_patches/patches/ || die "Failed to overwrite"
|
||||
cp -f $SRC/patches/series $DEST/zfs_patches/series/zfs-lustre
|
||||
|
||||
echo
|
||||
echo "--- Removing $SRC ---"
|
||||
rm -Rf $SRC
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
#!/bin/bash
|
||||
|
||||
prog=zpios-jbod.sh
|
||||
. ../.script-config
|
||||
|
||||
SPL_OPTIONS=$1
|
||||
ZPOOL_OPTIONS=$2
|
||||
KPIOS_OPTIONS=$3
|
||||
PROFILE_KPIOS_LOGS=$4
|
||||
KPIOS_PRE=$5
|
||||
KPIOS_POST=$6
|
||||
|
||||
PROFILE_KPIOS_PRE=/home/behlendo/src/zfs/scripts/profile-kpios-pre.sh
|
||||
PROFILE_KPIOS_POST=/home/behlendo/src/zfs/scripts/profile-kpios-post.sh
|
||||
|
||||
echo ------------------------- ZFS TEST LOG ---------------------------------
|
||||
echo -n "Date = "; date
|
||||
echo -n "Kernel = "; uname -r
|
||||
echo ------------------------------------------------------------------------
|
||||
|
||||
echo
|
||||
./load-zfs.sh "${SPL_OPTIONS}" "${ZPOOL_OPTIONS}"
|
||||
|
||||
sysctl -w kernel.spl.debug.mask=0
|
||||
sysctl -w kernel.spl.debug.subsystem=0
|
||||
|
||||
echo ---------------------- SPL Sysctl Tunings ------------------------------
|
||||
sysctl -A | grep spl
|
||||
echo
|
||||
|
||||
echo ------------------- SPL/ZPOOL Module Tunings ---------------------------
|
||||
grep [0-9] /sys/module/spl/parameters/*
|
||||
grep [0-9] /sys/module/zpool/parameters/*
|
||||
echo
|
||||
|
||||
DEVICES="/dev/sdn /dev/sdo /dev/sdp \
|
||||
/dev/sdq /dev/sdr /dev/sds \
|
||||
/dev/sdt /dev/sdu /dev/sdv \
|
||||
/dev/sdw /dev/sdx /dev/sdy"
|
||||
|
||||
${CMDDIR}/zpool/zpool create -F lustre ${DEVICES}
|
||||
${CMDDIR}/zpool/zpool status lustre
|
||||
|
||||
if [ -n "${KPIOS_PRE}" ]; then
|
||||
${KPIOS_PRE}
|
||||
fi
|
||||
|
||||
# Usage: zpios
|
||||
# --chunksize -c =values
|
||||
# --chunksize_low -a =value
|
||||
# --chunksize_high -b =value
|
||||
# --chunksize_incr -g =value
|
||||
# --offset -o =values
|
||||
# --offset_low -m =value
|
||||
# --offset_high -q =value
|
||||
# --offset_incr -r =value
|
||||
# --regioncount -n =values
|
||||
# --regioncount_low -i =value
|
||||
# --regioncount_high -j =value
|
||||
# --regioncount_incr -k =value
|
||||
# --threadcount -t =values
|
||||
# --threadcount_low -l =value
|
||||
# --threadcount_high -h =value
|
||||
# --threadcount_incr -e =value
|
||||
# --regionsize -s =values
|
||||
# --regionsize_low -A =value
|
||||
# --regionsize_high -B =value
|
||||
# --regionsize_incr -C =value
|
||||
# --cleanup -x
|
||||
# --verify -V
|
||||
# --zerocopy -z
|
||||
# --threaddelay -T =jiffies
|
||||
# --regionnoise -I =shift
|
||||
# --chunknoise -N =bytes
|
||||
# --prerun -P =pre-command
|
||||
# --postrun -R =post-command
|
||||
# --log -G =log directory
|
||||
# --pool | --path -p =pool name
|
||||
# --load -L =dmuio
|
||||
# --help -? =this help
|
||||
# --verbose -v =increase verbosity
|
||||
# --threadcount=256,256,256,256,256 \
|
||||
|
||||
CMD="${CMDDIR}/zpios/zpios \
|
||||
--load=dmuio \
|
||||
--path=lustre \
|
||||
--chunksize=1M \
|
||||
--regionsize=4M \
|
||||
--regioncount=16384 \
|
||||
--threadcount=256 \
|
||||
--offset=4M \
|
||||
--cleanup \
|
||||
--verbose \
|
||||
--human-readable \
|
||||
${KPIOS_OPTIONS} \
|
||||
--prerun=${PROFILE_KPIOS_PRE} \
|
||||
--postrun=${PROFILE_KPIOS_POST} \
|
||||
--log=${PROFILE_KPIOS_LOGS}"
|
||||
echo
|
||||
date
|
||||
echo ${CMD}
|
||||
$CMD
|
||||
date
|
||||
|
||||
if [ -n "${KPIOS_POST}" ]; then
|
||||
${KPIOS_POST}
|
||||
fi
|
||||
|
||||
${CMDDIR}/zpool/zpool destroy lustre
|
||||
./unload-zfs.sh
|
|
@ -0,0 +1,139 @@
|
|||
#!/bin/bash
|
||||
|
||||
prog=zpios.sh
|
||||
. ../.script-config
|
||||
|
||||
SPL_OPTIONS="spl_debug_mask=0 spl_debug_subsys=0 ${1}"
|
||||
ZPOOL_OPTIONS=$2
|
||||
KPIOS_OPTIONS=$3
|
||||
PROFILE_KPIOS_LOGS=$4
|
||||
KPIOS_PRE=$5
|
||||
KPIOS_POST=$6
|
||||
|
||||
PROFILE_KPIOS_PRE=/home/behlendo/src/zfs/scripts/profile-kpios-pre.sh
|
||||
PROFILE_KPIOS_POST=/home/behlendo/src/zfs/scripts/profile-kpios-post.sh
|
||||
|
||||
echo ------------------------- ZFS TEST LOG ---------------------------------
|
||||
echo -n "Date = "; date
|
||||
echo -n "Kernel = "; uname -r
|
||||
echo ------------------------------------------------------------------------
|
||||
|
||||
echo
|
||||
./load-zfs.sh "${SPL_OPTIONS}" "${ZPOOL_OPTIONS}"
|
||||
|
||||
echo ---------------------- SPL Sysctl Tunings ------------------------------
|
||||
sysctl -A | grep spl
|
||||
echo
|
||||
|
||||
echo ------------------- SPL/ZPOOL Module Tunings ---------------------------
|
||||
if [ -d /sys/module/spl/parameters ]; then
|
||||
grep [0-9] /sys/module/spl/parameters/*
|
||||
grep [0-9] /sys/module/zpool/parameters/*
|
||||
else
|
||||
grep [0-9] /sys/module/spl/*
|
||||
grep [0-9] /sys/module/zpool/*
|
||||
fi
|
||||
echo
|
||||
|
||||
# LOCAL HACK
|
||||
if [ `hostname` = "ilc23" ]; then
|
||||
DEVICES="/dev/sdy /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds \
|
||||
/dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx"
|
||||
else
|
||||
DEVICES="/dev/hda"
|
||||
fi
|
||||
|
||||
echo "${CMDDIR}/zpool/zpool create -F lustre ${DEVICES}"
|
||||
${CMDDIR}/zpool/zpool create -F lustre ${DEVICES}
|
||||
|
||||
echo "${CMDDIR}/zpool/zpool status lustre"
|
||||
${CMDDIR}/zpool/zpool status lustre
|
||||
|
||||
echo "Waiting for /dev/kpios to come up..."
|
||||
while [ ! -c /dev/kpios ]; do
|
||||
sleep 1
|
||||
done
|
||||
|
||||
if [ -n "${KPIOS_PRE}" ]; then
|
||||
${KPIOS_PRE}
|
||||
fi
|
||||
|
||||
# Usage: zpios
|
||||
# --chunksize -c =values
|
||||
# --chunksize_low -a =value
|
||||
# --chunksize_high -b =value
|
||||
# --chunksize_incr -g =value
|
||||
# --offset -o =values
|
||||
# --offset_low -m =value
|
||||
# --offset_high -q =value
|
||||
# --offset_incr -r =value
|
||||
# --regioncount -n =values
|
||||
# --regioncount_low -i =value
|
||||
# --regioncount_high -j =value
|
||||
# --regioncount_incr -k =value
|
||||
# --threadcount -t =values
|
||||
# --threadcount_low -l =value
|
||||
# --threadcount_high -h =value
|
||||
# --threadcount_incr -e =value
|
||||
# --regionsize -s =values
|
||||
# --regionsize_low -A =value
|
||||
# --regionsize_high -B =value
|
||||
# --regionsize_incr -C =value
|
||||
# --cleanup -x
|
||||
# --verify -V
|
||||
# --zerocopy -z
|
||||
# --threaddelay -T =jiffies
|
||||
# --regionnoise -I =shift
|
||||
# --chunknoise -N =bytes
|
||||
# --prerun -P =pre-command
|
||||
# --postrun -R =post-command
|
||||
# --log -G =log directory
|
||||
# --pool | --path -p =pool name
|
||||
# --load -L =dmuio
|
||||
# --help -? =this help
|
||||
# --verbose -v =increase verbosity
|
||||
|
||||
# --prerun=${PROFILE_KPIOS_PRE} \
|
||||
# --postrun=${PROFILE_KPIOS_POST} \
|
||||
|
||||
CMD="${CMDDIR}/zpios/zpios \
|
||||
--load=dmuio \
|
||||
--path=lustre \
|
||||
--chunksize=1M \
|
||||
--regionsize=4M \
|
||||
--regioncount=16384 \
|
||||
--threadcount=256,256,256,256,256 \
|
||||
--offset=4M \
|
||||
--cleanup \
|
||||
--verbose \
|
||||
--human-readable \
|
||||
${KPIOS_OPTIONS} \
|
||||
--log=${PROFILE_KPIOS_LOGS}"
|
||||
echo
|
||||
date
|
||||
echo ${CMD}
|
||||
$CMD
|
||||
date
|
||||
|
||||
if [ -n "${KPIOS_POST}" ]; then
|
||||
${KPIOS_POST}
|
||||
fi
|
||||
|
||||
${CMDDIR}/zpool/zpool destroy lustre
|
||||
|
||||
echo ---------------------- SPL Sysctl Tunings ------------------------------
|
||||
sysctl -A | grep spl
|
||||
echo
|
||||
|
||||
echo ------------------------ KSTAT Statistics ------------------------------
|
||||
echo ARCSTATS
|
||||
cat /proc/spl/kstat/zfs/arcstats
|
||||
echo
|
||||
echo VDEV_CACHE_STATS
|
||||
cat /proc/spl/kstat/zfs/vdev_cache_stats
|
||||
echo
|
||||
echo SLAB
|
||||
cat /proc/spl/kmem/slab
|
||||
echo
|
||||
|
||||
./unload-zfs.sh
|
|
@ -0,0 +1,16 @@
|
|||
subdir-m += lib
|
||||
subdir-m += zcmd
|
||||
|
||||
all:
|
||||
# Make the exported SPL symbols available to this module. There
|
||||
# is probably a better way to do this, but this will have to do
|
||||
# for now... an option to modpost perhaps.
|
||||
cp @splsymvers@ .
|
||||
|
||||
# Kick off the kernel build system
|
||||
$(MAKE) -C @LINUX@ SUBDIRS=`pwd` @KERNELMAKE_PARAMS@ modules
|
||||
|
||||
install uninstall clean distclean maintainer-clean distdir:
|
||||
$(MAKE) -C @LINUX@ SUBDIRS=`pwd` @KERNELMAKE_PARAMS@ $@
|
||||
|
||||
check:
|
|
@ -0,0 +1,12 @@
|
|||
subdir-m += libuutil # User space util support
|
||||
subdir-m += libumem # User space memory support
|
||||
subdir-m += libzfs # User space library support
|
||||
subdir-m += libsolcompat # User space compatibility library
|
||||
|
||||
subdir-m += libzpool # Kernel DMU/SPA
|
||||
subdir-m += libdmu-ctl # Kernel control interface
|
||||
|
||||
subdir-m += libavl # Kernel + user space AVL tree support
|
||||
subdir-m += libnvpair # Kernel + user space name/value support
|
||||
subdir-m += libzcommon # Kernel + user space common support
|
||||
subdir-m += libport # Kernel + user space linux support
|
|
@ -0,0 +1,31 @@
|
|||
subdir-m += include
|
||||
DISTFILES = avl.c
|
||||
|
||||
MODULE := zavl
|
||||
LIBRARY := libavl
|
||||
|
||||
# Compile as kernel module. Needed symlinks created for all
|
||||
# k* objects created by top level configure script.
|
||||
|
||||
EXTRA_CFLAGS = @KERNELCPPFLAGS@
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libavl/include
|
||||
|
||||
obj-m := ${MODULE}.o
|
||||
|
||||
${MODULE}-objs += kavl.o # Generic AVL support
|
||||
|
||||
# Compile as shared library. There's an extra useless host program
|
||||
# here called 'zu' because it was the easiest way I could convince
|
||||
# the kernel build system to construct a user space shared library.
|
||||
|
||||
HOSTCFLAGS += @HOSTCFLAGS@
|
||||
HOSTCFLAGS += -I@LIBDIR@/libsolcompat/include
|
||||
HOSTCFLAGS += -I@LIBDIR@/libport/include
|
||||
HOSTCFLAGS += -I@LIBDIR@/libavl/include
|
||||
|
||||
hostprogs-y := zu
|
||||
always := $(hostprogs-y)
|
||||
|
||||
zu-objs := zu.o ${LIBRARY}.so
|
||||
|
||||
${LIBRARY}-objs += uavl.o
|
|
@ -0,0 +1,969 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* AVL - generic AVL tree implementation for kernel use
|
||||
*
|
||||
* A complete description of AVL trees can be found in many CS textbooks.
|
||||
*
|
||||
* Here is a very brief overview. An AVL tree is a binary search tree that is
|
||||
* almost perfectly balanced. By "almost" perfectly balanced, we mean that at
|
||||
* any given node, the left and right subtrees are allowed to differ in height
|
||||
* by at most 1 level.
|
||||
*
|
||||
* This relaxation from a perfectly balanced binary tree allows doing
|
||||
* insertion and deletion relatively efficiently. Searching the tree is
|
||||
* still a fast operation, roughly O(log(N)).
|
||||
*
|
||||
* The key to insertion and deletion is a set of tree maniuplations called
|
||||
* rotations, which bring unbalanced subtrees back into the semi-balanced state.
|
||||
*
|
||||
* This implementation of AVL trees has the following peculiarities:
|
||||
*
|
||||
* - The AVL specific data structures are physically embedded as fields
|
||||
* in the "using" data structures. To maintain generality the code
|
||||
* must constantly translate between "avl_node_t *" and containing
|
||||
* data structure "void *"s by adding/subracting the avl_offset.
|
||||
*
|
||||
* - Since the AVL data is always embedded in other structures, there is
|
||||
* no locking or memory allocation in the AVL routines. This must be
|
||||
* provided for by the enclosing data structure's semantics. Typically,
|
||||
* avl_insert()/_add()/_remove()/avl_insert_here() require some kind of
|
||||
* exclusive write lock. Other operations require a read lock.
|
||||
*
|
||||
* - The implementation uses iteration instead of explicit recursion,
|
||||
* since it is intended to run on limited size kernel stacks. Since
|
||||
* there is no recursion stack present to move "up" in the tree,
|
||||
* there is an explicit "parent" link in the avl_node_t.
|
||||
*
|
||||
* - The left/right children pointers of a node are in an array.
|
||||
* In the code, variables (instead of constants) are used to represent
|
||||
* left and right indices. The implementation is written as if it only
|
||||
* dealt with left handed manipulations. By changing the value assigned
|
||||
* to "left", the code also works for right handed trees. The
|
||||
* following variables/terms are frequently used:
|
||||
*
|
||||
* int left; // 0 when dealing with left children,
|
||||
* // 1 for dealing with right children
|
||||
*
|
||||
* int left_heavy; // -1 when left subtree is taller at some node,
|
||||
* // +1 when right subtree is taller
|
||||
*
|
||||
* int right; // will be the opposite of left (0 or 1)
|
||||
* int right_heavy;// will be the opposite of left_heavy (-1 or 1)
|
||||
*
|
||||
* int direction; // 0 for "<" (ie. left child); 1 for ">" (right)
|
||||
*
|
||||
* Though it is a little more confusing to read the code, the approach
|
||||
* allows using half as much code (and hence cache footprint) for tree
|
||||
* manipulations and eliminates many conditional branches.
|
||||
*
|
||||
* - The avl_index_t is an opaque "cookie" used to find nodes at or
|
||||
* adjacent to where a new value would be inserted in the tree. The value
|
||||
* is a modified "avl_node_t *". The bottom bit (normally 0 for a
|
||||
* pointer) is set to indicate if that the new node has a value greater
|
||||
* than the value of the indicated "avl_node_t *".
|
||||
*/
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/param.h>
|
||||
#include <sys/debug.h>
|
||||
#include <sys/avl.h>
|
||||
#include <sys/cmn_err.h>
|
||||
|
||||
/*
|
||||
* Small arrays to translate between balance (or diff) values and child indeces.
|
||||
*
|
||||
* Code that deals with binary tree data structures will randomly use
|
||||
* left and right children when examining a tree. C "if()" statements
|
||||
* which evaluate randomly suffer from very poor hardware branch prediction.
|
||||
* In this code we avoid some of the branch mispredictions by using the
|
||||
* following translation arrays. They replace random branches with an
|
||||
* additional memory reference. Since the translation arrays are both very
|
||||
* small the data should remain efficiently in cache.
|
||||
*/
|
||||
static const int avl_child2balance[2] = {-1, 1};
|
||||
static const int avl_balance2child[] = {0, 0, 1};
|
||||
|
||||
|
||||
/*
|
||||
* Walk from one node to the previous valued node (ie. an infix walk
|
||||
* towards the left). At any given node we do one of 2 things:
|
||||
*
|
||||
* - If there is a left child, go to it, then to it's rightmost descendant.
|
||||
*
|
||||
* - otherwise we return thru parent nodes until we've come from a right child.
|
||||
*
|
||||
* Return Value:
|
||||
* NULL - if at the end of the nodes
|
||||
* otherwise next node
|
||||
*/
|
||||
void *
|
||||
avl_walk(avl_tree_t *tree, void *oldnode, int left)
|
||||
{
|
||||
size_t off = tree->avl_offset;
|
||||
avl_node_t *node = AVL_DATA2NODE(oldnode, off);
|
||||
int right = 1 - left;
|
||||
int was_child;
|
||||
|
||||
|
||||
/*
|
||||
* nowhere to walk to if tree is empty
|
||||
*/
|
||||
if (node == NULL)
|
||||
return (NULL);
|
||||
|
||||
/*
|
||||
* Visit the previous valued node. There are two possibilities:
|
||||
*
|
||||
* If this node has a left child, go down one left, then all
|
||||
* the way right.
|
||||
*/
|
||||
if (node->avl_child[left] != NULL) {
|
||||
for (node = node->avl_child[left];
|
||||
node->avl_child[right] != NULL;
|
||||
node = node->avl_child[right])
|
||||
;
|
||||
/*
|
||||
* Otherwise, return thru left children as far as we can.
|
||||
*/
|
||||
} else {
|
||||
for (;;) {
|
||||
was_child = AVL_XCHILD(node);
|
||||
node = AVL_XPARENT(node);
|
||||
if (node == NULL)
|
||||
return (NULL);
|
||||
if (was_child == right)
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return (AVL_NODE2DATA(node, off));
|
||||
}
|
||||
|
||||
/*
|
||||
* Return the lowest valued node in a tree or NULL.
|
||||
* (leftmost child from root of tree)
|
||||
*/
|
||||
void *
|
||||
avl_first(avl_tree_t *tree)
|
||||
{
|
||||
avl_node_t *node;
|
||||
avl_node_t *prev = NULL;
|
||||
size_t off = tree->avl_offset;
|
||||
|
||||
for (node = tree->avl_root; node != NULL; node = node->avl_child[0])
|
||||
prev = node;
|
||||
|
||||
if (prev != NULL)
|
||||
return (AVL_NODE2DATA(prev, off));
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Return the highest valued node in a tree or NULL.
|
||||
* (rightmost child from root of tree)
|
||||
*/
|
||||
void *
|
||||
avl_last(avl_tree_t *tree)
|
||||
{
|
||||
avl_node_t *node;
|
||||
avl_node_t *prev = NULL;
|
||||
size_t off = tree->avl_offset;
|
||||
|
||||
for (node = tree->avl_root; node != NULL; node = node->avl_child[1])
|
||||
prev = node;
|
||||
|
||||
if (prev != NULL)
|
||||
return (AVL_NODE2DATA(prev, off));
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Access the node immediately before or after an insertion point.
|
||||
*
|
||||
* "avl_index_t" is a (avl_node_t *) with the bottom bit indicating a child
|
||||
*
|
||||
* Return value:
|
||||
* NULL: no node in the given direction
|
||||
* "void *" of the found tree node
|
||||
*/
|
||||
void *
|
||||
avl_nearest(avl_tree_t *tree, avl_index_t where, int direction)
|
||||
{
|
||||
int child = AVL_INDEX2CHILD(where);
|
||||
avl_node_t *node = AVL_INDEX2NODE(where);
|
||||
void *data;
|
||||
size_t off = tree->avl_offset;
|
||||
|
||||
if (node == NULL) {
|
||||
ASSERT(tree->avl_root == NULL);
|
||||
return (NULL);
|
||||
}
|
||||
data = AVL_NODE2DATA(node, off);
|
||||
if (child != direction)
|
||||
return (data);
|
||||
|
||||
return (avl_walk(tree, data, direction));
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Search for the node which contains "value". The algorithm is a
|
||||
* simple binary tree search.
|
||||
*
|
||||
* return value:
|
||||
* NULL: the value is not in the AVL tree
|
||||
* *where (if not NULL) is set to indicate the insertion point
|
||||
* "void *" of the found tree node
|
||||
*/
|
||||
void *
|
||||
avl_find(avl_tree_t *tree, void *value, avl_index_t *where)
|
||||
{
|
||||
avl_node_t *node;
|
||||
avl_node_t *prev = NULL;
|
||||
int child = 0;
|
||||
int diff;
|
||||
size_t off = tree->avl_offset;
|
||||
|
||||
for (node = tree->avl_root; node != NULL;
|
||||
node = node->avl_child[child]) {
|
||||
|
||||
prev = node;
|
||||
|
||||
diff = tree->avl_compar(value, AVL_NODE2DATA(node, off));
|
||||
ASSERT(-1 <= diff && diff <= 1);
|
||||
if (diff == 0) {
|
||||
#ifdef DEBUG
|
||||
if (where != NULL)
|
||||
*where = 0;
|
||||
#endif
|
||||
return (AVL_NODE2DATA(node, off));
|
||||
}
|
||||
child = avl_balance2child[1 + diff];
|
||||
|
||||
}
|
||||
|
||||
if (where != NULL)
|
||||
*where = AVL_MKINDEX(prev, child);
|
||||
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Perform a rotation to restore balance at the subtree given by depth.
|
||||
*
|
||||
* This routine is used by both insertion and deletion. The return value
|
||||
* indicates:
|
||||
* 0 : subtree did not change height
|
||||
* !0 : subtree was reduced in height
|
||||
*
|
||||
* The code is written as if handling left rotations, right rotations are
|
||||
* symmetric and handled by swapping values of variables right/left[_heavy]
|
||||
*
|
||||
* On input balance is the "new" balance at "node". This value is either
|
||||
* -2 or +2.
|
||||
*/
|
||||
static int
|
||||
avl_rotation(avl_tree_t *tree, avl_node_t *node, int balance)
|
||||
{
|
||||
int left = !(balance < 0); /* when balance = -2, left will be 0 */
|
||||
int right = 1 - left;
|
||||
int left_heavy = balance >> 1;
|
||||
int right_heavy = -left_heavy;
|
||||
avl_node_t *parent = AVL_XPARENT(node);
|
||||
avl_node_t *child = node->avl_child[left];
|
||||
avl_node_t *cright;
|
||||
avl_node_t *gchild;
|
||||
avl_node_t *gright;
|
||||
avl_node_t *gleft;
|
||||
int which_child = AVL_XCHILD(node);
|
||||
int child_bal = AVL_XBALANCE(child);
|
||||
|
||||
/* BEGIN CSTYLED */
|
||||
/*
|
||||
* case 1 : node is overly left heavy, the left child is balanced or
|
||||
* also left heavy. This requires the following rotation.
|
||||
*
|
||||
* (node bal:-2)
|
||||
* / \
|
||||
* / \
|
||||
* (child bal:0 or -1)
|
||||
* / \
|
||||
* / \
|
||||
* cright
|
||||
*
|
||||
* becomes:
|
||||
*
|
||||
* (child bal:1 or 0)
|
||||
* / \
|
||||
* / \
|
||||
* (node bal:-1 or 0)
|
||||
* / \
|
||||
* / \
|
||||
* cright
|
||||
*
|
||||
* we detect this situation by noting that child's balance is not
|
||||
* right_heavy.
|
||||
*/
|
||||
/* END CSTYLED */
|
||||
if (child_bal != right_heavy) {
|
||||
|
||||
/*
|
||||
* compute new balance of nodes
|
||||
*
|
||||
* If child used to be left heavy (now balanced) we reduced
|
||||
* the height of this sub-tree -- used in "return...;" below
|
||||
*/
|
||||
child_bal += right_heavy; /* adjust towards right */
|
||||
|
||||
/*
|
||||
* move "cright" to be node's left child
|
||||
*/
|
||||
cright = child->avl_child[right];
|
||||
node->avl_child[left] = cright;
|
||||
if (cright != NULL) {
|
||||
AVL_SETPARENT(cright, node);
|
||||
AVL_SETCHILD(cright, left);
|
||||
}
|
||||
|
||||
/*
|
||||
* move node to be child's right child
|
||||
*/
|
||||
child->avl_child[right] = node;
|
||||
AVL_SETBALANCE(node, -child_bal);
|
||||
AVL_SETCHILD(node, right);
|
||||
AVL_SETPARENT(node, child);
|
||||
|
||||
/*
|
||||
* update the pointer into this subtree
|
||||
*/
|
||||
AVL_SETBALANCE(child, child_bal);
|
||||
AVL_SETCHILD(child, which_child);
|
||||
AVL_SETPARENT(child, parent);
|
||||
if (parent != NULL)
|
||||
parent->avl_child[which_child] = child;
|
||||
else
|
||||
tree->avl_root = child;
|
||||
|
||||
return (child_bal == 0);
|
||||
}
|
||||
|
||||
/* BEGIN CSTYLED */
|
||||
/*
|
||||
* case 2 : When node is left heavy, but child is right heavy we use
|
||||
* a different rotation.
|
||||
*
|
||||
* (node b:-2)
|
||||
* / \
|
||||
* / \
|
||||
* / \
|
||||
* (child b:+1)
|
||||
* / \
|
||||
* / \
|
||||
* (gchild b: != 0)
|
||||
* / \
|
||||
* / \
|
||||
* gleft gright
|
||||
*
|
||||
* becomes:
|
||||
*
|
||||
* (gchild b:0)
|
||||
* / \
|
||||
* / \
|
||||
* / \
|
||||
* (child b:?) (node b:?)
|
||||
* / \ / \
|
||||
* / \ / \
|
||||
* gleft gright
|
||||
*
|
||||
* computing the new balances is more complicated. As an example:
|
||||
* if gchild was right_heavy, then child is now left heavy
|
||||
* else it is balanced
|
||||
*/
|
||||
/* END CSTYLED */
|
||||
gchild = child->avl_child[right];
|
||||
gleft = gchild->avl_child[left];
|
||||
gright = gchild->avl_child[right];
|
||||
|
||||
/*
|
||||
* move gright to left child of node and
|
||||
*
|
||||
* move gleft to right child of node
|
||||
*/
|
||||
node->avl_child[left] = gright;
|
||||
if (gright != NULL) {
|
||||
AVL_SETPARENT(gright, node);
|
||||
AVL_SETCHILD(gright, left);
|
||||
}
|
||||
|
||||
child->avl_child[right] = gleft;
|
||||
if (gleft != NULL) {
|
||||
AVL_SETPARENT(gleft, child);
|
||||
AVL_SETCHILD(gleft, right);
|
||||
}
|
||||
|
||||
/*
|
||||
* move child to left child of gchild and
|
||||
*
|
||||
* move node to right child of gchild and
|
||||
*
|
||||
* fixup parent of all this to point to gchild
|
||||
*/
|
||||
balance = AVL_XBALANCE(gchild);
|
||||
gchild->avl_child[left] = child;
|
||||
AVL_SETBALANCE(child, (balance == right_heavy ? left_heavy : 0));
|
||||
AVL_SETPARENT(child, gchild);
|
||||
AVL_SETCHILD(child, left);
|
||||
|
||||
gchild->avl_child[right] = node;
|
||||
AVL_SETBALANCE(node, (balance == left_heavy ? right_heavy : 0));
|
||||
AVL_SETPARENT(node, gchild);
|
||||
AVL_SETCHILD(node, right);
|
||||
|
||||
AVL_SETBALANCE(gchild, 0);
|
||||
AVL_SETPARENT(gchild, parent);
|
||||
AVL_SETCHILD(gchild, which_child);
|
||||
if (parent != NULL)
|
||||
parent->avl_child[which_child] = gchild;
|
||||
else
|
||||
tree->avl_root = gchild;
|
||||
|
||||
return (1); /* the new tree is always shorter */
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Insert a new node into an AVL tree at the specified (from avl_find()) place.
|
||||
*
|
||||
* Newly inserted nodes are always leaf nodes in the tree, since avl_find()
|
||||
* searches out to the leaf positions. The avl_index_t indicates the node
|
||||
* which will be the parent of the new node.
|
||||
*
|
||||
* After the node is inserted, a single rotation further up the tree may
|
||||
* be necessary to maintain an acceptable AVL balance.
|
||||
*/
|
||||
void
|
||||
avl_insert(avl_tree_t *tree, void *new_data, avl_index_t where)
|
||||
{
|
||||
avl_node_t *node;
|
||||
avl_node_t *parent = AVL_INDEX2NODE(where);
|
||||
int old_balance;
|
||||
int new_balance;
|
||||
int which_child = AVL_INDEX2CHILD(where);
|
||||
size_t off = tree->avl_offset;
|
||||
|
||||
ASSERT(tree);
|
||||
#ifdef _LP64
|
||||
ASSERT(((uintptr_t)new_data & 0x7) == 0);
|
||||
#endif
|
||||
|
||||
node = AVL_DATA2NODE(new_data, off);
|
||||
|
||||
/*
|
||||
* First, add the node to the tree at the indicated position.
|
||||
*/
|
||||
++tree->avl_numnodes;
|
||||
|
||||
node->avl_child[0] = NULL;
|
||||
node->avl_child[1] = NULL;
|
||||
|
||||
AVL_SETCHILD(node, which_child);
|
||||
AVL_SETBALANCE(node, 0);
|
||||
AVL_SETPARENT(node, parent);
|
||||
if (parent != NULL) {
|
||||
ASSERT(parent->avl_child[which_child] == NULL);
|
||||
parent->avl_child[which_child] = node;
|
||||
} else {
|
||||
ASSERT(tree->avl_root == NULL);
|
||||
tree->avl_root = node;
|
||||
}
|
||||
/*
|
||||
* Now, back up the tree modifying the balance of all nodes above the
|
||||
* insertion point. If we get to a highly unbalanced ancestor, we
|
||||
* need to do a rotation. If we back out of the tree we are done.
|
||||
* If we brought any subtree into perfect balance (0), we are also done.
|
||||
*/
|
||||
for (;;) {
|
||||
node = parent;
|
||||
if (node == NULL)
|
||||
return;
|
||||
|
||||
/*
|
||||
* Compute the new balance
|
||||
*/
|
||||
old_balance = AVL_XBALANCE(node);
|
||||
new_balance = old_balance + avl_child2balance[which_child];
|
||||
|
||||
/*
|
||||
* If we introduced equal balance, then we are done immediately
|
||||
*/
|
||||
if (new_balance == 0) {
|
||||
AVL_SETBALANCE(node, 0);
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* If both old and new are not zero we went
|
||||
* from -1 to -2 balance, do a rotation.
|
||||
*/
|
||||
if (old_balance != 0)
|
||||
break;
|
||||
|
||||
AVL_SETBALANCE(node, new_balance);
|
||||
parent = AVL_XPARENT(node);
|
||||
which_child = AVL_XCHILD(node);
|
||||
}
|
||||
|
||||
/*
|
||||
* perform a rotation to fix the tree and return
|
||||
*/
|
||||
(void) avl_rotation(tree, node, new_balance);
|
||||
}
|
||||
|
||||
/*
|
||||
* Insert "new_data" in "tree" in the given "direction" either after or
|
||||
* before (AVL_AFTER, AVL_BEFORE) the data "here".
|
||||
*
|
||||
* Insertions can only be done at empty leaf points in the tree, therefore
|
||||
* if the given child of the node is already present we move to either
|
||||
* the AVL_PREV or AVL_NEXT and reverse the insertion direction. Since
|
||||
* every other node in the tree is a leaf, this always works.
|
||||
*
|
||||
* To help developers using this interface, we assert that the new node
|
||||
* is correctly ordered at every step of the way in DEBUG kernels.
|
||||
*/
|
||||
void
|
||||
avl_insert_here(
|
||||
avl_tree_t *tree,
|
||||
void *new_data,
|
||||
void *here,
|
||||
int direction)
|
||||
{
|
||||
avl_node_t *node;
|
||||
int child = direction; /* rely on AVL_BEFORE == 0, AVL_AFTER == 1 */
|
||||
#ifdef DEBUG
|
||||
int diff;
|
||||
#endif
|
||||
|
||||
ASSERT(tree != NULL);
|
||||
ASSERT(new_data != NULL);
|
||||
ASSERT(here != NULL);
|
||||
ASSERT(direction == AVL_BEFORE || direction == AVL_AFTER);
|
||||
|
||||
/*
|
||||
* If corresponding child of node is not NULL, go to the neighboring
|
||||
* node and reverse the insertion direction.
|
||||
*/
|
||||
node = AVL_DATA2NODE(here, tree->avl_offset);
|
||||
|
||||
#ifdef DEBUG
|
||||
diff = tree->avl_compar(new_data, here);
|
||||
ASSERT(-1 <= diff && diff <= 1);
|
||||
ASSERT(diff != 0);
|
||||
ASSERT(diff > 0 ? child == 1 : child == 0);
|
||||
#endif
|
||||
|
||||
if (node->avl_child[child] != NULL) {
|
||||
node = node->avl_child[child];
|
||||
child = 1 - child;
|
||||
while (node->avl_child[child] != NULL) {
|
||||
#ifdef DEBUG
|
||||
diff = tree->avl_compar(new_data,
|
||||
AVL_NODE2DATA(node, tree->avl_offset));
|
||||
ASSERT(-1 <= diff && diff <= 1);
|
||||
ASSERT(diff != 0);
|
||||
ASSERT(diff > 0 ? child == 1 : child == 0);
|
||||
#endif
|
||||
node = node->avl_child[child];
|
||||
}
|
||||
#ifdef DEBUG
|
||||
diff = tree->avl_compar(new_data,
|
||||
AVL_NODE2DATA(node, tree->avl_offset));
|
||||
ASSERT(-1 <= diff && diff <= 1);
|
||||
ASSERT(diff != 0);
|
||||
ASSERT(diff > 0 ? child == 1 : child == 0);
|
||||
#endif
|
||||
}
|
||||
ASSERT(node->avl_child[child] == NULL);
|
||||
|
||||
avl_insert(tree, new_data, AVL_MKINDEX(node, child));
|
||||
}
|
||||
|
||||
/*
|
||||
* Add a new node to an AVL tree.
|
||||
*/
|
||||
void
|
||||
avl_add(avl_tree_t *tree, void *new_node)
|
||||
{
|
||||
avl_index_t where;
|
||||
|
||||
/*
|
||||
* This is unfortunate. We want to call panic() here, even for
|
||||
* non-DEBUG kernels. In userland, however, we can't depend on anything
|
||||
* in libc or else the rtld build process gets confused. So, all we can
|
||||
* do in userland is resort to a normal ASSERT().
|
||||
*/
|
||||
if (avl_find(tree, new_node, &where) != NULL)
|
||||
#ifdef _KERNEL
|
||||
panic("avl_find() succeeded inside avl_add()");
|
||||
#else
|
||||
ASSERT(0);
|
||||
#endif
|
||||
avl_insert(tree, new_node, where);
|
||||
}
|
||||
|
||||
/*
|
||||
* Delete a node from the AVL tree. Deletion is similar to insertion, but
|
||||
* with 2 complications.
|
||||
*
|
||||
* First, we may be deleting an interior node. Consider the following subtree:
|
||||
*
|
||||
* d c c
|
||||
* / \ / \ / \
|
||||
* b e b e b e
|
||||
* / \ / \ /
|
||||
* a c a a
|
||||
*
|
||||
* When we are deleting node (d), we find and bring up an adjacent valued leaf
|
||||
* node, say (c), to take the interior node's place. In the code this is
|
||||
* handled by temporarily swapping (d) and (c) in the tree and then using
|
||||
* common code to delete (d) from the leaf position.
|
||||
*
|
||||
* Secondly, an interior deletion from a deep tree may require more than one
|
||||
* rotation to fix the balance. This is handled by moving up the tree through
|
||||
* parents and applying rotations as needed. The return value from
|
||||
* avl_rotation() is used to detect when a subtree did not change overall
|
||||
* height due to a rotation.
|
||||
*/
|
||||
void
|
||||
avl_remove(avl_tree_t *tree, void *data)
|
||||
{
|
||||
avl_node_t *delete;
|
||||
avl_node_t *parent;
|
||||
avl_node_t *node;
|
||||
avl_node_t tmp;
|
||||
int old_balance;
|
||||
int new_balance;
|
||||
int left;
|
||||
int right;
|
||||
int which_child;
|
||||
size_t off = tree->avl_offset;
|
||||
|
||||
ASSERT(tree);
|
||||
|
||||
delete = AVL_DATA2NODE(data, off);
|
||||
|
||||
/*
|
||||
* Deletion is easiest with a node that has at most 1 child.
|
||||
* We swap a node with 2 children with a sequentially valued
|
||||
* neighbor node. That node will have at most 1 child. Note this
|
||||
* has no effect on the ordering of the remaining nodes.
|
||||
*
|
||||
* As an optimization, we choose the greater neighbor if the tree
|
||||
* is right heavy, otherwise the left neighbor. This reduces the
|
||||
* number of rotations needed.
|
||||
*/
|
||||
if (delete->avl_child[0] != NULL && delete->avl_child[1] != NULL) {
|
||||
|
||||
/*
|
||||
* choose node to swap from whichever side is taller
|
||||
*/
|
||||
old_balance = AVL_XBALANCE(delete);
|
||||
left = avl_balance2child[old_balance + 1];
|
||||
right = 1 - left;
|
||||
|
||||
/*
|
||||
* get to the previous value'd node
|
||||
* (down 1 left, as far as possible right)
|
||||
*/
|
||||
for (node = delete->avl_child[left];
|
||||
node->avl_child[right] != NULL;
|
||||
node = node->avl_child[right])
|
||||
;
|
||||
|
||||
/*
|
||||
* create a temp placeholder for 'node'
|
||||
* move 'node' to delete's spot in the tree
|
||||
*/
|
||||
tmp = *node;
|
||||
|
||||
*node = *delete;
|
||||
if (node->avl_child[left] == node)
|
||||
node->avl_child[left] = &tmp;
|
||||
|
||||
parent = AVL_XPARENT(node);
|
||||
if (parent != NULL)
|
||||
parent->avl_child[AVL_XCHILD(node)] = node;
|
||||
else
|
||||
tree->avl_root = node;
|
||||
AVL_SETPARENT(node->avl_child[left], node);
|
||||
AVL_SETPARENT(node->avl_child[right], node);
|
||||
|
||||
/*
|
||||
* Put tmp where node used to be (just temporary).
|
||||
* It always has a parent and at most 1 child.
|
||||
*/
|
||||
delete = &tmp;
|
||||
parent = AVL_XPARENT(delete);
|
||||
parent->avl_child[AVL_XCHILD(delete)] = delete;
|
||||
which_child = (delete->avl_child[1] != 0);
|
||||
if (delete->avl_child[which_child] != NULL)
|
||||
AVL_SETPARENT(delete->avl_child[which_child], delete);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Here we know "delete" is at least partially a leaf node. It can
|
||||
* be easily removed from the tree.
|
||||
*/
|
||||
ASSERT(tree->avl_numnodes > 0);
|
||||
--tree->avl_numnodes;
|
||||
parent = AVL_XPARENT(delete);
|
||||
which_child = AVL_XCHILD(delete);
|
||||
if (delete->avl_child[0] != NULL)
|
||||
node = delete->avl_child[0];
|
||||
else
|
||||
node = delete->avl_child[1];
|
||||
|
||||
/*
|
||||
* Connect parent directly to node (leaving out delete).
|
||||
*/
|
||||
if (node != NULL) {
|
||||
AVL_SETPARENT(node, parent);
|
||||
AVL_SETCHILD(node, which_child);
|
||||
}
|
||||
if (parent == NULL) {
|
||||
tree->avl_root = node;
|
||||
return;
|
||||
}
|
||||
parent->avl_child[which_child] = node;
|
||||
|
||||
|
||||
/*
|
||||
* Since the subtree is now shorter, begin adjusting parent balances
|
||||
* and performing any needed rotations.
|
||||
*/
|
||||
do {
|
||||
|
||||
/*
|
||||
* Move up the tree and adjust the balance
|
||||
*
|
||||
* Capture the parent and which_child values for the next
|
||||
* iteration before any rotations occur.
|
||||
*/
|
||||
node = parent;
|
||||
old_balance = AVL_XBALANCE(node);
|
||||
new_balance = old_balance - avl_child2balance[which_child];
|
||||
parent = AVL_XPARENT(node);
|
||||
which_child = AVL_XCHILD(node);
|
||||
|
||||
/*
|
||||
* If a node was in perfect balance but isn't anymore then
|
||||
* we can stop, since the height didn't change above this point
|
||||
* due to a deletion.
|
||||
*/
|
||||
if (old_balance == 0) {
|
||||
AVL_SETBALANCE(node, new_balance);
|
||||
break;
|
||||
}
|
||||
|
||||
/*
|
||||
* If the new balance is zero, we don't need to rotate
|
||||
* else
|
||||
* need a rotation to fix the balance.
|
||||
* If the rotation doesn't change the height
|
||||
* of the sub-tree we have finished adjusting.
|
||||
*/
|
||||
if (new_balance == 0)
|
||||
AVL_SETBALANCE(node, new_balance);
|
||||
else if (!avl_rotation(tree, node, new_balance))
|
||||
break;
|
||||
} while (parent != NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* initialize a new AVL tree
|
||||
*/
|
||||
void
|
||||
avl_create(avl_tree_t *tree, int (*compar) (const void *, const void *),
|
||||
size_t size, size_t offset)
|
||||
{
|
||||
ASSERT(tree);
|
||||
ASSERT(compar);
|
||||
ASSERT(size > 0);
|
||||
ASSERT(size >= offset + sizeof (avl_node_t));
|
||||
#ifdef _LP64
|
||||
ASSERT((offset & 0x7) == 0);
|
||||
#endif
|
||||
|
||||
tree->avl_compar = compar;
|
||||
tree->avl_root = NULL;
|
||||
tree->avl_numnodes = 0;
|
||||
tree->avl_size = size;
|
||||
tree->avl_offset = offset;
|
||||
}
|
||||
|
||||
/*
|
||||
* Delete a tree.
|
||||
*/
|
||||
/* ARGSUSED */
|
||||
void
|
||||
avl_destroy(avl_tree_t *tree)
|
||||
{
|
||||
ASSERT(tree);
|
||||
ASSERT(tree->avl_numnodes == 0);
|
||||
ASSERT(tree->avl_root == NULL);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Return the number of nodes in an AVL tree.
|
||||
*/
|
||||
ulong_t
|
||||
avl_numnodes(avl_tree_t *tree)
|
||||
{
|
||||
ASSERT(tree);
|
||||
return (tree->avl_numnodes);
|
||||
}
|
||||
|
||||
|
||||
#define CHILDBIT (1L)
|
||||
|
||||
/*
|
||||
* Post-order tree walk used to visit all tree nodes and destroy the tree
|
||||
* in post order. This is used for destroying a tree w/o paying any cost
|
||||
* for rebalancing it.
|
||||
*
|
||||
* example:
|
||||
*
|
||||
* void *cookie = NULL;
|
||||
* my_data_t *node;
|
||||
*
|
||||
* while ((node = avl_destroy_nodes(tree, &cookie)) != NULL)
|
||||
* free(node);
|
||||
* avl_destroy(tree);
|
||||
*
|
||||
* The cookie is really an avl_node_t to the current node's parent and
|
||||
* an indication of which child you looked at last.
|
||||
*
|
||||
* On input, a cookie value of CHILDBIT indicates the tree is done.
|
||||
*/
|
||||
void *
|
||||
avl_destroy_nodes(avl_tree_t *tree, void **cookie)
|
||||
{
|
||||
avl_node_t *node;
|
||||
avl_node_t *parent;
|
||||
int child;
|
||||
void *first;
|
||||
size_t off = tree->avl_offset;
|
||||
|
||||
/*
|
||||
* Initial calls go to the first node or it's right descendant.
|
||||
*/
|
||||
if (*cookie == NULL) {
|
||||
first = avl_first(tree);
|
||||
|
||||
/*
|
||||
* deal with an empty tree
|
||||
*/
|
||||
if (first == NULL) {
|
||||
*cookie = (void *)CHILDBIT;
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
node = AVL_DATA2NODE(first, off);
|
||||
parent = AVL_XPARENT(node);
|
||||
goto check_right_side;
|
||||
}
|
||||
|
||||
/*
|
||||
* If there is no parent to return to we are done.
|
||||
*/
|
||||
parent = (avl_node_t *)((uintptr_t)(*cookie) & ~CHILDBIT);
|
||||
if (parent == NULL) {
|
||||
if (tree->avl_root != NULL) {
|
||||
ASSERT(tree->avl_numnodes == 1);
|
||||
tree->avl_root = NULL;
|
||||
tree->avl_numnodes = 0;
|
||||
}
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Remove the child pointer we just visited from the parent and tree.
|
||||
*/
|
||||
child = (uintptr_t)(*cookie) & CHILDBIT;
|
||||
parent->avl_child[child] = NULL;
|
||||
ASSERT(tree->avl_numnodes > 1);
|
||||
--tree->avl_numnodes;
|
||||
|
||||
/*
|
||||
* If we just did a right child or there isn't one, go up to parent.
|
||||
*/
|
||||
if (child == 1 || parent->avl_child[1] == NULL) {
|
||||
node = parent;
|
||||
parent = AVL_XPARENT(parent);
|
||||
goto done;
|
||||
}
|
||||
|
||||
/*
|
||||
* Do parent's right child, then leftmost descendent.
|
||||
*/
|
||||
node = parent->avl_child[1];
|
||||
while (node->avl_child[0] != NULL) {
|
||||
parent = node;
|
||||
node = node->avl_child[0];
|
||||
}
|
||||
|
||||
/*
|
||||
* If here, we moved to a left child. It may have one
|
||||
* child on the right (when balance == +1).
|
||||
*/
|
||||
check_right_side:
|
||||
if (node->avl_child[1] != NULL) {
|
||||
ASSERT(AVL_XBALANCE(node) == 1);
|
||||
parent = node;
|
||||
node = node->avl_child[1];
|
||||
ASSERT(node->avl_child[0] == NULL &&
|
||||
node->avl_child[1] == NULL);
|
||||
} else {
|
||||
ASSERT(AVL_XBALANCE(node) <= 0);
|
||||
}
|
||||
|
||||
done:
|
||||
if (parent == NULL) {
|
||||
*cookie = (void *)CHILDBIT;
|
||||
ASSERT(node == tree->avl_root);
|
||||
} else {
|
||||
*cookie = (void *)((uintptr_t)parent | AVL_XCHILD(node));
|
||||
}
|
||||
|
||||
return (AVL_NODE2DATA(node, off));
|
||||
}
|
|
@ -0,0 +1 @@
|
|||
subdir-m += sys
|
|
@ -0,0 +1 @@
|
|||
DISTFILES = avl.h avl_impl.h
|
|
@ -0,0 +1,298 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2005 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _AVL_H
|
||||
#define _AVL_H
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* This is a private header file. Applications should not directly include
|
||||
* this file.
|
||||
*/
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#include <sys/avl_impl.h>
|
||||
|
||||
/*
|
||||
* This is a generic implemenatation of AVL trees for use in the Solaris kernel.
|
||||
* The interfaces provide an efficient way of implementing an ordered set of
|
||||
* data structures.
|
||||
*
|
||||
* AVL trees provide an alternative to using an ordered linked list. Using AVL
|
||||
* trees will usually be faster, however they requires more storage. An ordered
|
||||
* linked list in general requires 2 pointers in each data structure. The
|
||||
* AVL tree implementation uses 3 pointers. The following chart gives the
|
||||
* approximate performance of operations with the different approaches:
|
||||
*
|
||||
* Operation Link List AVL tree
|
||||
* --------- -------- --------
|
||||
* lookup O(n) O(log(n))
|
||||
*
|
||||
* insert 1 node constant constant
|
||||
*
|
||||
* delete 1 node constant between constant and O(log(n))
|
||||
*
|
||||
* delete all nodes O(n) O(n)
|
||||
*
|
||||
* visit the next
|
||||
* or prev node constant between constant and O(log(n))
|
||||
*
|
||||
*
|
||||
* The data structure nodes are anchored at an "avl_tree_t" (the equivalent
|
||||
* of a list header) and the individual nodes will have a field of
|
||||
* type "avl_node_t" (corresponding to list pointers).
|
||||
*
|
||||
* The type "avl_index_t" is used to indicate a position in the list for
|
||||
* certain calls.
|
||||
*
|
||||
* The usage scenario is generally:
|
||||
*
|
||||
* 1. Create the list/tree with: avl_create()
|
||||
*
|
||||
* followed by any mixture of:
|
||||
*
|
||||
* 2a. Insert nodes with: avl_add(), or avl_find() and avl_insert()
|
||||
*
|
||||
* 2b. Visited elements with:
|
||||
* avl_first() - returns the lowest valued node
|
||||
* avl_last() - returns the highest valued node
|
||||
* AVL_NEXT() - given a node go to next higher one
|
||||
* AVL_PREV() - given a node go to previous lower one
|
||||
*
|
||||
* 2c. Find the node with the closest value either less than or greater
|
||||
* than a given value with avl_nearest().
|
||||
*
|
||||
* 2d. Remove individual nodes from the list/tree with avl_remove().
|
||||
*
|
||||
* and finally when the list is being destroyed
|
||||
*
|
||||
* 3. Use avl_destroy_nodes() to quickly process/free up any remaining nodes.
|
||||
* Note that once you use avl_destroy_nodes(), you can no longer
|
||||
* use any routine except avl_destroy_nodes() and avl_destoy().
|
||||
*
|
||||
* 4. Use avl_destroy() to destroy the AVL tree itself.
|
||||
*
|
||||
* Any locking for multiple thread access is up to the user to provide, just
|
||||
* as is needed for any linked list implementation.
|
||||
*/
|
||||
|
||||
|
||||
/*
|
||||
* Type used for the root of the AVL tree.
|
||||
*/
|
||||
typedef struct avl_tree avl_tree_t;
|
||||
|
||||
/*
|
||||
* The data nodes in the AVL tree must have a field of this type.
|
||||
*/
|
||||
typedef struct avl_node avl_node_t;
|
||||
|
||||
/*
|
||||
* An opaque type used to locate a position in the tree where a node
|
||||
* would be inserted.
|
||||
*/
|
||||
typedef uintptr_t avl_index_t;
|
||||
|
||||
|
||||
/*
|
||||
* Direction constants used for avl_nearest().
|
||||
*/
|
||||
#define AVL_BEFORE (0)
|
||||
#define AVL_AFTER (1)
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* Prototypes
|
||||
*
|
||||
* Where not otherwise mentioned, "void *" arguments are a pointer to the
|
||||
* user data structure which must contain a field of type avl_node_t.
|
||||
*
|
||||
* Also assume the user data structures looks like:
|
||||
* stuct my_type {
|
||||
* ...
|
||||
* avl_node_t my_link;
|
||||
* ...
|
||||
* };
|
||||
*/
|
||||
|
||||
/*
|
||||
* Initialize an AVL tree. Arguments are:
|
||||
*
|
||||
* tree - the tree to be initialized
|
||||
* compar - function to compare two nodes, it must return exactly: -1, 0, or +1
|
||||
* -1 for <, 0 for ==, and +1 for >
|
||||
* size - the value of sizeof(struct my_type)
|
||||
* offset - the value of OFFSETOF(struct my_type, my_link)
|
||||
*/
|
||||
extern void avl_create(avl_tree_t *tree,
|
||||
int (*compar) (const void *, const void *), size_t size, size_t offset);
|
||||
|
||||
|
||||
/*
|
||||
* Find a node with a matching value in the tree. Returns the matching node
|
||||
* found. If not found, it returns NULL and then if "where" is not NULL it sets
|
||||
* "where" for use with avl_insert() or avl_nearest().
|
||||
*
|
||||
* node - node that has the value being looked for
|
||||
* where - position for use with avl_nearest() or avl_insert(), may be NULL
|
||||
*/
|
||||
extern void *avl_find(avl_tree_t *tree, void *node, avl_index_t *where);
|
||||
|
||||
/*
|
||||
* Insert a node into the tree.
|
||||
*
|
||||
* node - the node to insert
|
||||
* where - position as returned from avl_find()
|
||||
*/
|
||||
extern void avl_insert(avl_tree_t *tree, void *node, avl_index_t where);
|
||||
|
||||
/*
|
||||
* Insert "new_data" in "tree" in the given "direction" either after
|
||||
* or before the data "here".
|
||||
*
|
||||
* This might be usefull for avl clients caching recently accessed
|
||||
* data to avoid doing avl_find() again for insertion.
|
||||
*
|
||||
* new_data - new data to insert
|
||||
* here - existing node in "tree"
|
||||
* direction - either AVL_AFTER or AVL_BEFORE the data "here".
|
||||
*/
|
||||
extern void avl_insert_here(avl_tree_t *tree, void *new_data, void *here,
|
||||
int direction);
|
||||
|
||||
|
||||
/*
|
||||
* Return the first or last valued node in the tree. Will return NULL
|
||||
* if the tree is empty.
|
||||
*
|
||||
*/
|
||||
extern void *avl_first(avl_tree_t *tree);
|
||||
extern void *avl_last(avl_tree_t *tree);
|
||||
|
||||
|
||||
/*
|
||||
* Return the next or previous valued node in the tree.
|
||||
* AVL_NEXT() will return NULL if at the last node.
|
||||
* AVL_PREV() will return NULL if at the first node.
|
||||
*
|
||||
* node - the node from which the next or previous node is found
|
||||
*/
|
||||
#define AVL_NEXT(tree, node) avl_walk(tree, node, AVL_AFTER)
|
||||
#define AVL_PREV(tree, node) avl_walk(tree, node, AVL_BEFORE)
|
||||
|
||||
|
||||
/*
|
||||
* Find the node with the nearest value either greater or less than
|
||||
* the value from a previous avl_find(). Returns the node or NULL if
|
||||
* there isn't a matching one.
|
||||
*
|
||||
* where - position as returned from avl_find()
|
||||
* direction - either AVL_BEFORE or AVL_AFTER
|
||||
*
|
||||
* EXAMPLE get the greatest node that is less than a given value:
|
||||
*
|
||||
* avl_tree_t *tree;
|
||||
* struct my_data look_for_value = {....};
|
||||
* struct my_data *node;
|
||||
* struct my_data *less;
|
||||
* avl_index_t where;
|
||||
*
|
||||
* node = avl_find(tree, &look_for_value, &where);
|
||||
* if (node != NULL)
|
||||
* less = AVL_PREV(tree, node);
|
||||
* else
|
||||
* less = avl_nearest(tree, where, AVL_BEFORE);
|
||||
*/
|
||||
extern void *avl_nearest(avl_tree_t *tree, avl_index_t where, int direction);
|
||||
|
||||
|
||||
/*
|
||||
* Add a single node to the tree.
|
||||
* The node must not be in the tree, and it must not
|
||||
* compare equal to any other node already in the tree.
|
||||
*
|
||||
* node - the node to add
|
||||
*/
|
||||
extern void avl_add(avl_tree_t *tree, void *node);
|
||||
|
||||
|
||||
/*
|
||||
* Remove a single node from the tree. The node must be in the tree.
|
||||
*
|
||||
* node - the node to remove
|
||||
*/
|
||||
extern void avl_remove(avl_tree_t *tree, void *node);
|
||||
|
||||
|
||||
/*
|
||||
* Return the number of nodes in the tree
|
||||
*/
|
||||
extern ulong_t avl_numnodes(avl_tree_t *tree);
|
||||
|
||||
|
||||
/*
|
||||
* Used to destroy any remaining nodes in a tree. The cookie argument should
|
||||
* be initialized to NULL before the first call. Returns a node that has been
|
||||
* removed from the tree and may be free()'d. Returns NULL when the tree is
|
||||
* empty.
|
||||
*
|
||||
* Once you call avl_destroy_nodes(), you can only continuing calling it and
|
||||
* finally avl_destroy(). No other AVL routines will be valid.
|
||||
*
|
||||
* cookie - a "void *" used to save state between calls to avl_destroy_nodes()
|
||||
*
|
||||
* EXAMPLE:
|
||||
* avl_tree_t *tree;
|
||||
* struct my_data *node;
|
||||
* void *cookie;
|
||||
*
|
||||
* cookie = NULL;
|
||||
* while ((node = avl_destroy_nodes(tree, &cookie)) != NULL)
|
||||
* free(node);
|
||||
* avl_destroy(tree);
|
||||
*/
|
||||
extern void *avl_destroy_nodes(avl_tree_t *tree, void **cookie);
|
||||
|
||||
|
||||
/*
|
||||
* Final destroy of an AVL tree. Arguments are:
|
||||
*
|
||||
* tree - the empty tree to destroy
|
||||
*/
|
||||
extern void avl_destroy(avl_tree_t *tree);
|
||||
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _AVL_H */
|
|
@ -0,0 +1,164 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _AVL_IMPL_H
|
||||
#define _AVL_IMPL_H
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* This is a private header file. Applications should not directly include
|
||||
* this file.
|
||||
*/
|
||||
|
||||
#include <sys/types.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
|
||||
/*
|
||||
* generic AVL tree implementation for kernel use
|
||||
*
|
||||
* There are 5 pieces of information stored for each node in an AVL tree
|
||||
*
|
||||
* pointer to less than child
|
||||
* pointer to greater than child
|
||||
* a pointer to the parent of this node
|
||||
* an indication [0/1] of which child I am of my parent
|
||||
* a "balance" (-1, 0, +1) indicating which child tree is taller
|
||||
*
|
||||
* Since they only need 3 bits, the last two fields are packed into the
|
||||
* bottom bits of the parent pointer on 64 bit machines to save on space.
|
||||
*/
|
||||
|
||||
#ifndef _LP64
|
||||
|
||||
struct avl_node {
|
||||
struct avl_node *avl_child[2]; /* left/right children */
|
||||
struct avl_node *avl_parent; /* this node's parent */
|
||||
unsigned short avl_child_index; /* my index in parent's avl_child[] */
|
||||
short avl_balance; /* balance value: -1, 0, +1 */
|
||||
};
|
||||
|
||||
#define AVL_XPARENT(n) ((n)->avl_parent)
|
||||
#define AVL_SETPARENT(n, p) ((n)->avl_parent = (p))
|
||||
|
||||
#define AVL_XCHILD(n) ((n)->avl_child_index)
|
||||
#define AVL_SETCHILD(n, c) ((n)->avl_child_index = (unsigned short)(c))
|
||||
|
||||
#define AVL_XBALANCE(n) ((n)->avl_balance)
|
||||
#define AVL_SETBALANCE(n, b) ((n)->avl_balance = (short)(b))
|
||||
|
||||
#else /* _LP64 */
|
||||
|
||||
/*
|
||||
* for 64 bit machines, avl_pcb contains parent pointer, balance and child_index
|
||||
* values packed in the following manner:
|
||||
*
|
||||
* |63 3| 2 |1 0 |
|
||||
* |-------------------------------------|-----------------|-------------|
|
||||
* | avl_parent hi order bits | avl_child_index | avl_balance |
|
||||
* | | | + 1 |
|
||||
* |-------------------------------------|-----------------|-------------|
|
||||
*
|
||||
*/
|
||||
struct avl_node {
|
||||
struct avl_node *avl_child[2]; /* left/right children nodes */
|
||||
uintptr_t avl_pcb; /* parent, child_index, balance */
|
||||
};
|
||||
|
||||
/*
|
||||
* macros to extract/set fields in avl_pcb
|
||||
*
|
||||
* pointer to the parent of the current node is the high order bits
|
||||
*/
|
||||
#define AVL_XPARENT(n) ((struct avl_node *)((n)->avl_pcb & ~7))
|
||||
#define AVL_SETPARENT(n, p) \
|
||||
((n)->avl_pcb = (((n)->avl_pcb & 7) | (uintptr_t)(p)))
|
||||
|
||||
/*
|
||||
* index of this node in its parent's avl_child[]: bit #2
|
||||
*/
|
||||
#define AVL_XCHILD(n) (((n)->avl_pcb >> 2) & 1)
|
||||
#define AVL_SETCHILD(n, c) \
|
||||
((n)->avl_pcb = (uintptr_t)(((n)->avl_pcb & ~4) | ((c) << 2)))
|
||||
|
||||
/*
|
||||
* balance indication for a node, lowest 2 bits. A valid balance is
|
||||
* -1, 0, or +1, and is encoded by adding 1 to the value to get the
|
||||
* unsigned values of 0, 1, 2.
|
||||
*/
|
||||
#define AVL_XBALANCE(n) ((int)(((n)->avl_pcb & 3) - 1))
|
||||
#define AVL_SETBALANCE(n, b) \
|
||||
((n)->avl_pcb = (uintptr_t)((((n)->avl_pcb & ~3) | ((b) + 1))))
|
||||
|
||||
#endif /* _LP64 */
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* switch between a node and data pointer for a given tree
|
||||
* the value of "o" is tree->avl_offset
|
||||
*/
|
||||
#define AVL_NODE2DATA(n, o) ((void *)((uintptr_t)(n) - (o)))
|
||||
#define AVL_DATA2NODE(d, o) ((struct avl_node *)((uintptr_t)(d) + (o)))
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* macros used to create/access an avl_index_t
|
||||
*/
|
||||
#define AVL_INDEX2NODE(x) ((avl_node_t *)((x) & ~1))
|
||||
#define AVL_INDEX2CHILD(x) ((x) & 1)
|
||||
#define AVL_MKINDEX(n, c) ((avl_index_t)(n) | (c))
|
||||
|
||||
|
||||
/*
|
||||
* The tree structure. The fields avl_root, avl_compar, and avl_offset come
|
||||
* first since they are needed for avl_find(). We want them to fit into
|
||||
* a single 64 byte cache line to make avl_find() as fast as possible.
|
||||
*/
|
||||
struct avl_tree {
|
||||
struct avl_node *avl_root; /* root node in tree */
|
||||
int (*avl_compar)(const void *, const void *);
|
||||
size_t avl_offset; /* offsetof(type, avl_link_t field) */
|
||||
ulong_t avl_numnodes; /* number of nodes in the tree */
|
||||
size_t avl_size; /* sizeof user type struct */
|
||||
};
|
||||
|
||||
|
||||
/*
|
||||
* This will only by used via AVL_NEXT() or AVL_PREV()
|
||||
*/
|
||||
extern void *avl_walk(struct avl_tree *, void *, int);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _AVL_IMPL_H */
|
|
@ -0,0 +1,28 @@
|
|||
# NOTE: dctl_client.c, dctl_common.c, dctl_server.c, dctl_thrpool.c unused
|
||||
# by kernel port. Potentially they should just be removed if we don't care
|
||||
# able user space lustre intergration from this source base.
|
||||
|
||||
# NOTE: For clarity this directly should simply be renamed libzpl and
|
||||
# the full kernel implementation should be minimally stubbed out.
|
||||
|
||||
subdir-m += include
|
||||
DISTFILES = dctl_client.c dctl_common.c dctl_server.c dctl_thrpool.c
|
||||
DISTFILES += dmu_send.c rrwlock.c zfs_acl.c zfs_ctldir.c
|
||||
DISTFILES += zfs_dir.c zfs_fuid.c zfs_ioctl.c zfs_log.c zfs_replay.c
|
||||
DISTFILES += zfs_rlock.c zfs_vfsops.c zfs_vnops.c zvol.c
|
||||
|
||||
MODULE := zctl
|
||||
|
||||
EXTRA_CFLAGS = @KERNELCPPFLAGS@
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libzcommon/include
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libdmu-ctl/include
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libavl/include
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libport/include
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libnvpair/include
|
||||
|
||||
obj-m := ${MODULE}.o
|
||||
|
||||
${MODULE}-objs += zvol.o # Volume emulation interface
|
||||
${MODULE}-objs += zfs_ioctl.o # /dev/zfs_ioctl interface
|
||||
${MODULE}-objs += zfs_vfsops.o
|
||||
${MODULE}-objs += dmu_send.o
|
|
@ -0,0 +1,263 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <ftw.h>
|
||||
#include <errno.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/socket.h>
|
||||
#include <sys/un.h>
|
||||
#include <sys/debug.h>
|
||||
|
||||
#include <sys/dmu_ctl.h>
|
||||
#include <sys/dmu_ctl_impl.h>
|
||||
|
||||
/*
|
||||
* Try to connect to the socket given in path.
|
||||
*
|
||||
* For nftw() convenience, returns 0 if unsuccessful, otherwise
|
||||
* returns the socket descriptor.
|
||||
*/
|
||||
static int try_connect(const char *path)
|
||||
{
|
||||
struct sockaddr_un name;
|
||||
int sock;
|
||||
|
||||
sock = socket(PF_UNIX, SOCK_STREAM, 0);
|
||||
if (sock == -1) {
|
||||
perror("socket");
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* The socket fd cannot be 0 otherwise nftw() will not interpret the
|
||||
* return code correctly.
|
||||
*/
|
||||
VERIFY(sock != 0);
|
||||
|
||||
name.sun_family = AF_UNIX;
|
||||
strncpy(name.sun_path, path, sizeof(name.sun_path));
|
||||
|
||||
name.sun_path[sizeof(name.sun_path) - 1] = '\0';
|
||||
|
||||
if (connect(sock, (struct sockaddr *) &name, sizeof(name)) == -1) {
|
||||
close(sock);
|
||||
return 0;
|
||||
}
|
||||
|
||||
return sock;
|
||||
}
|
||||
|
||||
/*
|
||||
* nftw() callback.
|
||||
*/
|
||||
static int nftw_cb(const char *fpath, const struct stat *sb, int typeflag,
|
||||
struct FTW *ftwbuf)
|
||||
{
|
||||
if (!S_ISSOCK(sb->st_mode))
|
||||
return 0;
|
||||
|
||||
if (strcmp(&fpath[ftwbuf->base], SOCKNAME) != 0)
|
||||
return 0;
|
||||
|
||||
return try_connect(fpath);
|
||||
}
|
||||
|
||||
/*
|
||||
* For convenience, if check_subdirs is true we walk the directory tree to
|
||||
* find a good socket.
|
||||
*/
|
||||
int dctlc_connect(const char *dir, boolean_t check_subdirs)
|
||||
{
|
||||
char *fpath;
|
||||
int fd;
|
||||
|
||||
if (check_subdirs)
|
||||
fd = nftw(dir, nftw_cb, 10, FTW_PHYS);
|
||||
else {
|
||||
fpath = malloc(strlen(dir) + strlen(SOCKNAME) + 2);
|
||||
if (fpath == NULL)
|
||||
return -1;
|
||||
|
||||
strcpy(fpath, dir);
|
||||
strcat(fpath, "/" SOCKNAME);
|
||||
|
||||
fd = try_connect(fpath);
|
||||
|
||||
free(fpath);
|
||||
}
|
||||
|
||||
return fd == 0 ? -1 : fd;
|
||||
}
|
||||
|
||||
void dctlc_disconnect(int fd)
|
||||
{
|
||||
(void) shutdown(fd, SHUT_RDWR);
|
||||
}
|
||||
|
||||
static int dctl_reply_copyin(int fd, dctl_cmd_t *cmd)
|
||||
{
|
||||
return dctl_send_data(fd, (void *)(uintptr_t) cmd->u.dcmd_copy.ptr,
|
||||
cmd->u.dcmd_copy.size);
|
||||
}
|
||||
|
||||
static int dctl_reply_copyinstr(int fd, dctl_cmd_t *cmd)
|
||||
{
|
||||
dctl_cmd_t reply;
|
||||
char *from;
|
||||
size_t len, buflen, to_copy;
|
||||
int error;
|
||||
|
||||
reply.dcmd_msg = DCTL_GEN_REPLY;
|
||||
|
||||
from = (char *)(uintptr_t) cmd->u.dcmd_copy.ptr;
|
||||
|
||||
buflen = cmd->u.dcmd_copy.size;
|
||||
to_copy = strnlen(from, buflen - 1);
|
||||
|
||||
reply.u.dcmd_reply.rc = from[to_copy] == '\0' ? 0 : ENAMETOOLONG;
|
||||
reply.u.dcmd_reply.size = to_copy;
|
||||
|
||||
error = dctl_send_msg(fd, &reply);
|
||||
|
||||
if (!error && to_copy > 0)
|
||||
error = dctl_send_data(fd, from, to_copy);
|
||||
|
||||
return error;
|
||||
}
|
||||
|
||||
static int dctl_reply_copyout(int fd, dctl_cmd_t *cmd)
|
||||
{
|
||||
return dctl_read_data(fd, (void *)(uintptr_t) cmd->u.dcmd_copy.ptr,
|
||||
cmd->u.dcmd_copy.size);
|
||||
}
|
||||
|
||||
static int dctl_reply_fd_read(int fd, dctl_cmd_t *cmd)
|
||||
{
|
||||
dctl_cmd_t reply;
|
||||
void *buf;
|
||||
int error;
|
||||
ssize_t rrc, size = cmd->u.dcmd_fd_io.size;
|
||||
|
||||
buf = malloc(size);
|
||||
if (buf == NULL)
|
||||
return ENOMEM;
|
||||
|
||||
rrc = read(cmd->u.dcmd_fd_io.fd, buf, size);
|
||||
|
||||
reply.dcmd_msg = DCTL_GEN_REPLY;
|
||||
reply.u.dcmd_reply.rc = rrc == -1 ? errno : 0;
|
||||
reply.u.dcmd_reply.size = rrc;
|
||||
|
||||
error = dctl_send_msg(fd, &reply);
|
||||
|
||||
if (!error && rrc > 0)
|
||||
error = dctl_send_data(fd, buf, rrc);
|
||||
|
||||
out:
|
||||
free(buf);
|
||||
|
||||
return error;
|
||||
}
|
||||
|
||||
static int dctl_reply_fd_write(int fd, dctl_cmd_t *cmd)
|
||||
{
|
||||
dctl_cmd_t reply;
|
||||
void *buf;
|
||||
int error;
|
||||
ssize_t wrc, size = cmd->u.dcmd_fd_io.size;
|
||||
|
||||
buf = malloc(size);
|
||||
if (buf == NULL)
|
||||
return ENOMEM;
|
||||
|
||||
error = dctl_read_data(fd, buf, size);
|
||||
if (error)
|
||||
goto out;
|
||||
|
||||
wrc = write(cmd->u.dcmd_fd_io.fd, buf, size);
|
||||
|
||||
reply.dcmd_msg = DCTL_GEN_REPLY;
|
||||
reply.u.dcmd_reply.rc = wrc == -1 ? errno : 0;
|
||||
reply.u.dcmd_reply.size = wrc;
|
||||
|
||||
error = dctl_send_msg(fd, &reply);
|
||||
|
||||
out:
|
||||
free(buf);
|
||||
|
||||
return error;
|
||||
}
|
||||
|
||||
int dctlc_ioctl(int fd, int32_t request, void *arg)
|
||||
{
|
||||
int error;
|
||||
dctl_cmd_t cmd;
|
||||
|
||||
ASSERT(fd != 0);
|
||||
|
||||
cmd.dcmd_msg = DCTL_IOCTL;
|
||||
|
||||
cmd.u.dcmd_ioctl.cmd = request;
|
||||
cmd.u.dcmd_ioctl.arg = (uintptr_t) arg;
|
||||
|
||||
error = dctl_send_msg(fd, &cmd);
|
||||
|
||||
while (!error && (error = dctl_read_msg(fd, &cmd)) == 0) {
|
||||
switch (cmd.dcmd_msg) {
|
||||
case DCTL_IOCTL_REPLY:
|
||||
error = cmd.u.dcmd_reply.rc;
|
||||
goto out;
|
||||
case DCTL_COPYIN:
|
||||
error = dctl_reply_copyin(fd, &cmd);
|
||||
break;
|
||||
case DCTL_COPYINSTR:
|
||||
error = dctl_reply_copyinstr(fd, &cmd);
|
||||
break;
|
||||
case DCTL_COPYOUT:
|
||||
error = dctl_reply_copyout(fd, &cmd);
|
||||
break;
|
||||
case DCTL_FD_READ:
|
||||
error = dctl_reply_fd_read(fd, &cmd);
|
||||
break;
|
||||
case DCTL_FD_WRITE:
|
||||
error = dctl_reply_fd_write(fd, &cmd);
|
||||
break;
|
||||
default:
|
||||
fprintf(stderr, "%s(): invalid message "
|
||||
"received.\n", __func__);
|
||||
error = EINVAL;
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
|
||||
out:
|
||||
errno = error;
|
||||
return error ? -1 : 0;
|
||||
}
|
|
@ -0,0 +1,109 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <errno.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/socket.h>
|
||||
|
||||
#include <sys/dmu_ctl.h>
|
||||
#include <sys/dmu_ctl_impl.h>
|
||||
|
||||
int dctl_read_msg(int fd, dctl_cmd_t *cmd)
|
||||
{
|
||||
int error;
|
||||
|
||||
/*
|
||||
* First, read only the magic number and the protocol version.
|
||||
*
|
||||
* This prevents blocking forever in case the size of dctl_cmd_t
|
||||
* shrinks in future protocol versions.
|
||||
*/
|
||||
error = dctl_read_data(fd, cmd, DCTL_CMD_HEADER_SIZE);
|
||||
|
||||
if (!error &&cmd->dcmd_magic != DCTL_MAGIC) {
|
||||
fprintf(stderr, "%s(): invalid magic number\n", __func__);
|
||||
error = EIO;
|
||||
}
|
||||
|
||||
if (!error && cmd->dcmd_version != DCTL_PROTOCOL_VER) {
|
||||
fprintf(stderr, "%s(): invalid protocol version\n", __func__);
|
||||
error = ENOTSUP;
|
||||
}
|
||||
|
||||
if (error)
|
||||
return error;
|
||||
|
||||
/* Get the rest of the command */
|
||||
return dctl_read_data(fd, (caddr_t) cmd + DCTL_CMD_HEADER_SIZE,
|
||||
sizeof(dctl_cmd_t) - DCTL_CMD_HEADER_SIZE);
|
||||
}
|
||||
|
||||
int dctl_send_msg(int fd, dctl_cmd_t *cmd)
|
||||
{
|
||||
cmd->dcmd_magic = DCTL_MAGIC;
|
||||
cmd->dcmd_version = DCTL_PROTOCOL_VER;
|
||||
|
||||
return dctl_send_data(fd, cmd, sizeof(dctl_cmd_t));
|
||||
}
|
||||
|
||||
int dctl_read_data(int fd, void *ptr, size_t size)
|
||||
{
|
||||
size_t read = 0;
|
||||
size_t left = size;
|
||||
ssize_t rc;
|
||||
|
||||
while (left > 0) {
|
||||
rc = recv(fd, (caddr_t) ptr + read, left, 0);
|
||||
|
||||
/* File descriptor closed */
|
||||
if (rc == 0)
|
||||
return ECONNRESET;
|
||||
|
||||
if (rc == -1) {
|
||||
if (errno == EINTR)
|
||||
continue;
|
||||
return errno;
|
||||
}
|
||||
|
||||
read += rc;
|
||||
left -= rc;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int dctl_send_data(int fd, const void *ptr, size_t size)
|
||||
{
|
||||
ssize_t rc;
|
||||
|
||||
do {
|
||||
rc = send(fd, ptr, size, MSG_NOSIGNAL);
|
||||
} while(rc == -1 && errno == EINTR);
|
||||
|
||||
return rc == size ? 0 : EIO;
|
||||
}
|
||||
|
|
@ -0,0 +1,476 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stddef.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <signal.h>
|
||||
#include <limits.h>
|
||||
#include <errno.h>
|
||||
#include <poll.h>
|
||||
#include <pthread.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/debug.h>
|
||||
#include <sys/socket.h>
|
||||
#include <sys/stat.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/un.h>
|
||||
#include <sys/list.h>
|
||||
#include <sys/cred.h>
|
||||
|
||||
#include <sys/dmu_ctl.h>
|
||||
#include <sys/dmu_ctl_impl.h>
|
||||
|
||||
static dctl_sock_info_t ctl_sock = {
|
||||
.dsi_mtx = PTHREAD_MUTEX_INITIALIZER,
|
||||
.dsi_fd = -1
|
||||
};
|
||||
|
||||
static int dctl_create_socket_common();
|
||||
|
||||
/*
|
||||
* Routines from zfs_ioctl.c
|
||||
*/
|
||||
extern int zfs_ioctl_init();
|
||||
extern int zfs_ioctl_fini();
|
||||
extern int zfsdev_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr,
|
||||
int *rvalp);
|
||||
|
||||
/*
|
||||
* We can't simply put the client file descriptor in wthr_info_t because we
|
||||
* have no way of accessing it from the DMU code without extensive
|
||||
* modifications.
|
||||
*
|
||||
* Therefore each worker thread will have it's own global thread-specific
|
||||
* client_fd variable.
|
||||
*/
|
||||
static __thread int client_fd = -1;
|
||||
|
||||
int dctls_copyin(const void *src, void *dest, size_t size)
|
||||
{
|
||||
dctl_cmd_t cmd;
|
||||
|
||||
VERIFY(client_fd >= 0);
|
||||
|
||||
cmd.dcmd_msg = DCTL_COPYIN;
|
||||
cmd.u.dcmd_copy.ptr = (uintptr_t) src;
|
||||
cmd.u.dcmd_copy.size = size;
|
||||
|
||||
if (dctl_send_msg(client_fd, &cmd) != 0)
|
||||
return EFAULT;
|
||||
|
||||
if (dctl_read_data(client_fd, dest, size) != 0)
|
||||
return EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int dctls_copyinstr(const char *from, char *to, size_t max, size_t *len)
|
||||
{
|
||||
dctl_cmd_t msg;
|
||||
size_t copied;
|
||||
|
||||
VERIFY(client_fd >= 0);
|
||||
|
||||
if (max == 0)
|
||||
return ENAMETOOLONG;
|
||||
if (max < 0)
|
||||
return EFAULT;
|
||||
|
||||
msg.dcmd_msg = DCTL_COPYINSTR;
|
||||
msg.u.dcmd_copy.ptr = (uintptr_t) from;
|
||||
msg.u.dcmd_copy.size = max;
|
||||
|
||||
if (dctl_send_msg(client_fd, &msg) != 0)
|
||||
return EFAULT;
|
||||
|
||||
if (dctl_read_msg(client_fd, &msg) != 0)
|
||||
return EFAULT;
|
||||
|
||||
if (msg.dcmd_msg != DCTL_GEN_REPLY)
|
||||
return EFAULT;
|
||||
|
||||
copied = msg.u.dcmd_reply.size;
|
||||
|
||||
if (copied >= max)
|
||||
return EFAULT;
|
||||
|
||||
if (copied > 0)
|
||||
if (dctl_read_data(client_fd, to, copied) != 0)
|
||||
return EFAULT;
|
||||
|
||||
to[copied] = '\0';
|
||||
|
||||
if (len != NULL)
|
||||
*len = copied + 1;
|
||||
|
||||
return msg.u.dcmd_reply.rc;
|
||||
}
|
||||
|
||||
int dctls_copyout(const void *src, void *dest, size_t size)
|
||||
{
|
||||
dctl_cmd_t cmd;
|
||||
|
||||
VERIFY(client_fd >= 0);
|
||||
|
||||
cmd.dcmd_msg = DCTL_COPYOUT;
|
||||
cmd.u.dcmd_copy.ptr = (uintptr_t) dest;
|
||||
cmd.u.dcmd_copy.size = size;
|
||||
|
||||
if (dctl_send_msg(client_fd, &cmd) != 0)
|
||||
return EFAULT;
|
||||
|
||||
if (dctl_send_data(client_fd, src, size) != 0)
|
||||
return EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int dctls_fd_read(int fd, void *buf, ssize_t len, ssize_t *residp)
|
||||
{
|
||||
dctl_cmd_t msg;
|
||||
uint64_t dsize;
|
||||
int error;
|
||||
|
||||
VERIFY(client_fd >= 0);
|
||||
|
||||
msg.dcmd_msg = DCTL_FD_READ;
|
||||
msg.u.dcmd_fd_io.fd = fd;
|
||||
msg.u.dcmd_fd_io.size = len;
|
||||
|
||||
if ((error = dctl_send_msg(client_fd, &msg)) != 0)
|
||||
return error;
|
||||
|
||||
if ((error = dctl_read_msg(client_fd, &msg)) != 0)
|
||||
return error;
|
||||
|
||||
if (msg.dcmd_msg != DCTL_GEN_REPLY)
|
||||
return EIO;
|
||||
|
||||
if (msg.u.dcmd_reply.rc != 0)
|
||||
return msg.u.dcmd_reply.rc;
|
||||
|
||||
dsize = msg.u.dcmd_reply.size;
|
||||
|
||||
if (dsize > 0)
|
||||
error = dctl_read_data(client_fd, buf, dsize);
|
||||
|
||||
*residp = len - dsize;
|
||||
|
||||
return error;
|
||||
}
|
||||
|
||||
int dctls_fd_write(int fd, const void *src, ssize_t len)
|
||||
{
|
||||
dctl_cmd_t msg;
|
||||
int error;
|
||||
|
||||
VERIFY(client_fd >= 0);
|
||||
|
||||
msg.dcmd_msg = DCTL_FD_WRITE;
|
||||
msg.u.dcmd_fd_io.fd = fd;
|
||||
msg.u.dcmd_fd_io.size = len;
|
||||
|
||||
error = dctl_send_msg(client_fd, &msg);
|
||||
|
||||
if (!error)
|
||||
error = dctl_send_data(client_fd, src, len);
|
||||
|
||||
if (!error)
|
||||
error = dctl_read_msg(client_fd, &msg);
|
||||
|
||||
if (error)
|
||||
return error;
|
||||
|
||||
if (msg.dcmd_msg != DCTL_GEN_REPLY)
|
||||
return EIO;
|
||||
|
||||
if (msg.u.dcmd_reply.rc != 0)
|
||||
return msg.u.dcmd_reply.rc;
|
||||
|
||||
/*
|
||||
* We have to do this because the original upstream code
|
||||
* does not check if residp == len.
|
||||
*/
|
||||
if (msg.u.dcmd_reply.size != len)
|
||||
return EIO;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Handle a new connection */
|
||||
static void dctl_handle_conn(int sock_fd)
|
||||
{
|
||||
dctl_cmd_t cmd;
|
||||
dev_t dev = { 0 };
|
||||
int rc;
|
||||
|
||||
client_fd = sock_fd;
|
||||
|
||||
while (dctl_read_msg(sock_fd, &cmd) == 0) {
|
||||
if (cmd.dcmd_msg != DCTL_IOCTL) {
|
||||
fprintf(stderr, "%s(): unexpected message type.\n",
|
||||
__func__);
|
||||
break;
|
||||
}
|
||||
|
||||
rc = zfsdev_ioctl(dev, cmd.u.dcmd_ioctl.cmd,
|
||||
(intptr_t) cmd.u.dcmd_ioctl.arg, 0, NULL, NULL);
|
||||
|
||||
cmd.dcmd_msg = DCTL_IOCTL_REPLY;
|
||||
cmd.u.dcmd_reply.rc = rc;
|
||||
|
||||
if (dctl_send_msg(sock_fd, &cmd) != 0)
|
||||
break;
|
||||
}
|
||||
close(sock_fd);
|
||||
|
||||
client_fd = -1;
|
||||
}
|
||||
|
||||
/* Main worker thread loop */
|
||||
static void *dctl_thread(void *arg)
|
||||
{
|
||||
wthr_info_t *thr = arg;
|
||||
struct pollfd fds[1];
|
||||
|
||||
fds[0].events = POLLIN;
|
||||
|
||||
pthread_mutex_lock(&ctl_sock.dsi_mtx);
|
||||
|
||||
while (!thr->wthr_exit) {
|
||||
/* Clean-up dead threads */
|
||||
dctl_thr_join();
|
||||
|
||||
/* The file descriptor might change in the thread lifetime */
|
||||
fds[0].fd = ctl_sock.dsi_fd;
|
||||
|
||||
/* Poll socket with 1-second timeout */
|
||||
int rc = poll(fds, 1, 1000);
|
||||
if (rc == 0 || (rc == -1 && errno == EINTR))
|
||||
continue;
|
||||
|
||||
/* Recheck the exit flag */
|
||||
if (thr->wthr_exit)
|
||||
break;
|
||||
|
||||
if (rc == -1) {
|
||||
/* Unknown error, let's try to recreate the socket */
|
||||
close(ctl_sock.dsi_fd);
|
||||
ctl_sock.dsi_fd = -1;
|
||||
|
||||
if (dctl_create_socket_common() != 0)
|
||||
break;
|
||||
|
||||
continue;
|
||||
}
|
||||
ASSERT(rc == 1);
|
||||
|
||||
short rev = fds[0].revents;
|
||||
if (rev == 0)
|
||||
continue;
|
||||
ASSERT(rev == POLLIN);
|
||||
|
||||
/*
|
||||
* At this point there should be a connection ready to be
|
||||
* accepted.
|
||||
*/
|
||||
int client_fd = accept(ctl_sock.dsi_fd, NULL, NULL);
|
||||
/* Many possible errors here, we'll just retry */
|
||||
if (client_fd == -1)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* Now lets handle the request. This can take a very
|
||||
* long time (hours even), so we'll let other threads
|
||||
* handle new connections.
|
||||
*/
|
||||
pthread_mutex_unlock(&ctl_sock.dsi_mtx);
|
||||
|
||||
dctl_thr_rebalance(thr, B_FALSE);
|
||||
dctl_handle_conn(client_fd);
|
||||
dctl_thr_rebalance(thr, B_TRUE);
|
||||
|
||||
pthread_mutex_lock(&ctl_sock.dsi_mtx);
|
||||
}
|
||||
pthread_mutex_unlock(&ctl_sock.dsi_mtx);
|
||||
|
||||
dctl_thr_die(thr);
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static int dctl_create_socket_common()
|
||||
{
|
||||
dctl_sock_info_t *s = &ctl_sock;
|
||||
size_t size;
|
||||
int error;
|
||||
|
||||
ASSERT(s->dsi_fd == -1);
|
||||
|
||||
/*
|
||||
* Unlink old socket, in case it exists.
|
||||
* We don't care about errors here.
|
||||
*/
|
||||
unlink(s->dsi_path);
|
||||
|
||||
/* Create the socket */
|
||||
s->dsi_fd = socket(PF_UNIX, SOCK_STREAM, 0);
|
||||
if (s->dsi_fd == -1) {
|
||||
error = errno;
|
||||
perror("socket");
|
||||
return error;
|
||||
}
|
||||
|
||||
s->dsi_addr.sun_family = AF_UNIX;
|
||||
|
||||
size = sizeof(s->dsi_addr.sun_path) - 1;
|
||||
strncpy(s->dsi_addr.sun_path, s->dsi_path, size);
|
||||
|
||||
s->dsi_addr.sun_path[size] = '\0';
|
||||
|
||||
if (bind(s->dsi_fd, (struct sockaddr *) &s->dsi_addr,
|
||||
sizeof(s->dsi_addr)) != 0) {
|
||||
error = errno;
|
||||
perror("bind");
|
||||
return error;
|
||||
}
|
||||
|
||||
if (listen(s->dsi_fd, LISTEN_BACKLOG) != 0) {
|
||||
error = errno;
|
||||
perror("listen");
|
||||
unlink(s->dsi_path);
|
||||
return error;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int dctl_create_socket(const char *cfg_dir)
|
||||
{
|
||||
int error;
|
||||
dctl_sock_info_t *s = &ctl_sock;
|
||||
|
||||
ASSERT(s->dsi_path == NULL);
|
||||
ASSERT(s->dsi_fd == -1);
|
||||
|
||||
int pathsize = strlen(cfg_dir) + strlen(SOCKNAME) + 2;
|
||||
if (pathsize > sizeof(s->dsi_addr.sun_path))
|
||||
return ENAMETOOLONG;
|
||||
|
||||
s->dsi_path = malloc(pathsize);
|
||||
if (s->dsi_path == NULL)
|
||||
return ENOMEM;
|
||||
|
||||
strcpy(s->dsi_path, cfg_dir);
|
||||
strcat(s->dsi_path, "/" SOCKNAME);
|
||||
|
||||
/*
|
||||
* For convenience, create the directory in case it doesn't exist.
|
||||
* We don't care about errors here.
|
||||
*/
|
||||
mkdir(cfg_dir, 0770);
|
||||
|
||||
error = dctl_create_socket_common();
|
||||
|
||||
if (error) {
|
||||
free(s->dsi_path);
|
||||
|
||||
if (s->dsi_fd != -1) {
|
||||
close(s->dsi_fd);
|
||||
s->dsi_fd = -1;
|
||||
}
|
||||
}
|
||||
|
||||
return error;
|
||||
}
|
||||
|
||||
static void dctl_destroy_socket()
|
||||
{
|
||||
dctl_sock_info_t *s = &ctl_sock;
|
||||
|
||||
ASSERT(s->dsi_path != NULL);
|
||||
ASSERT(s->dsi_fd != -1);
|
||||
|
||||
close(s->dsi_fd);
|
||||
s->dsi_fd = -1;
|
||||
|
||||
unlink(s->dsi_path);
|
||||
free(s->dsi_path);
|
||||
}
|
||||
|
||||
/*
|
||||
* Initialize the DMU userspace control interface.
|
||||
* This should be called after kernel_init().
|
||||
*
|
||||
* Note that only very rarely we have more than a couple of simultaneous
|
||||
* lzfs/lzpool connections. Since the thread pool grows automatically when all
|
||||
* threads are busy, a good value for min_thr and max_free_thr is 2.
|
||||
*/
|
||||
int dctl_server_init(const char *cfg_dir, int min_thr, int max_free_thr)
|
||||
{
|
||||
int error;
|
||||
|
||||
ASSERT(min_thr > 0);
|
||||
ASSERT(max_free_thr >= min_thr);
|
||||
|
||||
error = zfs_ioctl_init();
|
||||
if (error)
|
||||
return error;
|
||||
|
||||
error = dctl_create_socket(cfg_dir);
|
||||
if (error) {
|
||||
(void) zfs_ioctl_fini();
|
||||
return error;
|
||||
}
|
||||
|
||||
error = dctl_thr_pool_create(min_thr, max_free_thr, dctl_thread);
|
||||
if (error) {
|
||||
(void) zfs_ioctl_fini();
|
||||
dctl_destroy_socket();
|
||||
return error;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Terminate control interface.
|
||||
* This should be called after closing all objsets, but before calling
|
||||
* kernel_fini().
|
||||
* May return EBUSY if the SPA is busy.
|
||||
*
|
||||
* Thread pool destruction can take a while due to poll()
|
||||
* timeout or due to a thread being busy (e.g. a backup is being taken).
|
||||
*/
|
||||
int dctl_server_fini()
|
||||
{
|
||||
dctl_thr_pool_stop();
|
||||
dctl_destroy_socket();
|
||||
|
||||
return zfs_ioctl_fini();
|
||||
}
|
|
@ -0,0 +1,253 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stddef.h>
|
||||
#include <time.h>
|
||||
#include <pthread.h>
|
||||
#include <errno.h>
|
||||
#include <sys/list.h>
|
||||
#include <sys/debug.h>
|
||||
|
||||
#include <sys/dmu_ctl.h>
|
||||
#include <sys/dmu_ctl_impl.h>
|
||||
|
||||
static dctl_thr_info_t thr_pool = {
|
||||
.dti_mtx = PTHREAD_MUTEX_INITIALIZER
|
||||
};
|
||||
|
||||
/*
|
||||
* Create n threads.
|
||||
* Callers must acquire thr_pool.dti_mtx first.
|
||||
*/
|
||||
static int dctl_thr_create(int n)
|
||||
{
|
||||
dctl_thr_info_t *p = &thr_pool;
|
||||
int error;
|
||||
|
||||
for (int i = 0; i < n; i++) {
|
||||
wthr_info_t *thr = malloc(sizeof(wthr_info_t));
|
||||
if (thr == NULL)
|
||||
return ENOMEM;
|
||||
|
||||
thr->wthr_exit = B_FALSE;
|
||||
thr->wthr_free = B_TRUE;
|
||||
|
||||
error = pthread_create(&thr->wthr_id, NULL, p->dti_thr_func,
|
||||
thr);
|
||||
if (error) {
|
||||
free(thr);
|
||||
return error;
|
||||
}
|
||||
|
||||
p->dti_free++;
|
||||
|
||||
list_insert_tail(&p->dti_list, thr);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Mark the thread as dead.
|
||||
* Must be called right before exiting the main thread function.
|
||||
*/
|
||||
void dctl_thr_die(wthr_info_t *thr)
|
||||
{
|
||||
dctl_thr_info_t *p = &thr_pool;
|
||||
|
||||
thr->wthr_exit = B_TRUE;
|
||||
dctl_thr_rebalance(thr, B_FALSE);
|
||||
|
||||
pthread_mutex_lock(&p->dti_mtx);
|
||||
|
||||
list_remove(&p->dti_list, thr);
|
||||
list_insert_tail(&p->dti_join_list, thr);
|
||||
|
||||
pthread_mutex_unlock(&p->dti_mtx);
|
||||
}
|
||||
|
||||
/*
|
||||
* Clean-up dead threads.
|
||||
*/
|
||||
void dctl_thr_join()
|
||||
{
|
||||
dctl_thr_info_t *p = &thr_pool;
|
||||
wthr_info_t *thr;
|
||||
|
||||
pthread_mutex_lock(&p->dti_mtx);
|
||||
|
||||
while ((thr = list_head(&p->dti_join_list))) {
|
||||
list_remove(&p->dti_join_list, thr);
|
||||
|
||||
ASSERT(!pthread_equal(thr->wthr_id, pthread_self()));
|
||||
|
||||
/*
|
||||
* This should not block because all the threads
|
||||
* on this list should have died already.
|
||||
*
|
||||
* pthread_join() can only return an error if
|
||||
* we made a programming mistake.
|
||||
*/
|
||||
VERIFY(pthread_join(thr->wthr_id, NULL) == 0);
|
||||
|
||||
ASSERT(thr->wthr_exit);
|
||||
ASSERT(!thr->wthr_free);
|
||||
|
||||
free(thr);
|
||||
}
|
||||
|
||||
pthread_mutex_unlock(&p->dti_mtx);
|
||||
}
|
||||
|
||||
/*
|
||||
* Adjust the number of free threads in the pool and the thread status.
|
||||
*
|
||||
* Callers must acquire thr_pool.dti_mtx first.
|
||||
*/
|
||||
static void dctl_thr_adjust_free(wthr_info_t *thr, boolean_t set_free)
|
||||
{
|
||||
dctl_thr_info_t *p = &thr_pool;
|
||||
|
||||
ASSERT(p->dti_free >= 0);
|
||||
|
||||
if (!thr->wthr_free && set_free)
|
||||
p->dti_free++;
|
||||
else if (thr->wthr_free && !set_free)
|
||||
p->dti_free--;
|
||||
|
||||
ASSERT(p->dti_free >= 0);
|
||||
|
||||
thr->wthr_free = set_free;
|
||||
}
|
||||
|
||||
/*
|
||||
* Rebalance threads. Also adjusts the free status of the thread.
|
||||
* Will set the thread exit flag if the number of free threads is above
|
||||
* the limit.
|
||||
*/
|
||||
void dctl_thr_rebalance(wthr_info_t *thr, boolean_t set_free)
|
||||
{
|
||||
dctl_thr_info_t *p = &thr_pool;
|
||||
|
||||
pthread_mutex_lock(&p->dti_mtx);
|
||||
|
||||
if (p->dti_exit || p->dti_free > p->dti_max_free)
|
||||
thr->wthr_exit = B_TRUE;
|
||||
|
||||
if (thr->wthr_exit)
|
||||
set_free = B_FALSE;
|
||||
|
||||
dctl_thr_adjust_free(thr, set_free);
|
||||
|
||||
if (!p->dti_exit && p->dti_free == 0)
|
||||
dctl_thr_create(1);
|
||||
|
||||
pthread_mutex_unlock(&p->dti_mtx);
|
||||
}
|
||||
|
||||
/*
|
||||
* Stop the thread pool.
|
||||
*
|
||||
* This can take a while since it actually waits for all threads to exit.
|
||||
*/
|
||||
void dctl_thr_pool_stop()
|
||||
{
|
||||
dctl_thr_info_t *p = &thr_pool;
|
||||
wthr_info_t *thr;
|
||||
struct timespec ts;
|
||||
|
||||
pthread_mutex_lock(&p->dti_mtx);
|
||||
|
||||
ASSERT(!p->dti_exit);
|
||||
p->dti_exit = B_TRUE;
|
||||
|
||||
/* Let's flag the threads first */
|
||||
thr = list_head(&p->dti_list);
|
||||
while (thr != NULL) {
|
||||
thr->wthr_exit = B_TRUE;
|
||||
dctl_thr_adjust_free(thr, B_FALSE);
|
||||
|
||||
thr = list_next(&p->dti_list, thr);
|
||||
}
|
||||
|
||||
pthread_mutex_unlock(&p->dti_mtx);
|
||||
|
||||
/* Now let's wait for them to exit */
|
||||
ts.tv_sec = 0;
|
||||
ts.tv_nsec = 50000000; /* 50ms */
|
||||
do {
|
||||
nanosleep(&ts, NULL);
|
||||
|
||||
pthread_mutex_lock(&p->dti_mtx);
|
||||
thr = list_head(&p->dti_list);
|
||||
pthread_mutex_unlock(&p->dti_mtx);
|
||||
|
||||
dctl_thr_join();
|
||||
} while(thr != NULL);
|
||||
|
||||
ASSERT(p->dti_free == 0);
|
||||
|
||||
ASSERT(list_is_empty(&p->dti_list));
|
||||
ASSERT(list_is_empty(&p->dti_join_list));
|
||||
|
||||
list_destroy(&p->dti_list);
|
||||
list_destroy(&p->dti_join_list);
|
||||
}
|
||||
|
||||
/*
|
||||
* Create thread pool.
|
||||
*
|
||||
* If at least one thread creation fails, it will stop all previous
|
||||
* threads and return a non-zero value.
|
||||
*/
|
||||
int dctl_thr_pool_create(int min_thr, int max_free_thr,
|
||||
thr_func_t *thr_func)
|
||||
{
|
||||
int error;
|
||||
dctl_thr_info_t *p = &thr_pool;
|
||||
|
||||
ASSERT(p->dti_free == 0);
|
||||
|
||||
/* Initialize global variables */
|
||||
p->dti_min = min_thr;
|
||||
p->dti_max_free = max_free_thr;
|
||||
p->dti_exit = B_FALSE;
|
||||
p->dti_thr_func = thr_func;
|
||||
|
||||
list_create(&p->dti_list, sizeof(wthr_info_t), offsetof(wthr_info_t,
|
||||
wthr_node));
|
||||
list_create(&p->dti_join_list, sizeof(wthr_info_t),
|
||||
offsetof(wthr_info_t, wthr_node));
|
||||
|
||||
pthread_mutex_lock(&p->dti_mtx);
|
||||
error = dctl_thr_create(min_thr);
|
||||
pthread_mutex_unlock(&p->dti_mtx);
|
||||
|
||||
if (error)
|
||||
dctl_thr_pool_stop();
|
||||
|
||||
return error;
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1 @@
|
|||
subdir-m += sys
|
|
@ -0,0 +1 @@
|
|||
DISTFILES = dmu_ctl.h dmu_ctl_impl.h
|
|
@ -0,0 +1,71 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DMU_CTL_H
|
||||
#define _SYS_DMU_CTL_H
|
||||
|
||||
#include <sys/types.h>
|
||||
|
||||
/* Default directory where the clients search for sockets to connect */
|
||||
#define DMU_CTL_DEFAULT_DIR "/var/run/zfs/udmu"
|
||||
|
||||
/*
|
||||
* These functions are called by the server process.
|
||||
*
|
||||
* kernel_init() must be called before dctl_server_init().
|
||||
* kernel_fini() must not be called before dctl_server_fini().
|
||||
*
|
||||
* All objsets must be closed and object references be released before calling
|
||||
* dctl_server_fini(), otherwise it will return EBUSY.
|
||||
*
|
||||
* Note: On Solaris, it is highly recommended to either catch or ignore the
|
||||
* SIGPIPE signal, otherwise the server process will die if the client is
|
||||
* killed.
|
||||
*/
|
||||
int dctl_server_init(const char *cfg_dir, int min_threads,
|
||||
int max_free_threads);
|
||||
int dctl_server_fini();
|
||||
|
||||
/*
|
||||
* The following functions are called by the DMU from the server process context
|
||||
* (in the worker threads).
|
||||
*/
|
||||
int dctls_copyin(const void *src, void *dest, size_t size);
|
||||
int dctls_copyinstr(const char *from, char *to, size_t max,
|
||||
size_t *len);
|
||||
int dctls_copyout(const void *src, void *dest, size_t size);
|
||||
int dctls_fd_read(int fd, void *buf, ssize_t len, ssize_t *residp);
|
||||
int dctls_fd_write(int fd, const void *src, ssize_t len);
|
||||
|
||||
/*
|
||||
* These functions are called by the client process (libzfs).
|
||||
*/
|
||||
int dctlc_connect(const char *dir, boolean_t check_subdirs);
|
||||
void dctlc_disconnect(int fd);
|
||||
|
||||
int dctlc_ioctl(int fd, int32_t request, void *arg);
|
||||
|
||||
#endif
|
|
@ -0,0 +1,144 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_DMU_CTL_IMPL_H
|
||||
#define _SYS_DMU_CTL_IMPL_H
|
||||
|
||||
#include <sys/list.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/socket.h>
|
||||
#include <sys/un.h>
|
||||
#include <pthread.h>
|
||||
|
||||
#define SOCKNAME "dmu_socket"
|
||||
|
||||
#define DCTL_PROTOCOL_VER 1
|
||||
#define DCTL_MAGIC 0xdc71b1070c01dc71ll
|
||||
|
||||
/* Message types */
|
||||
enum {
|
||||
DCTL_IOCTL,
|
||||
DCTL_IOCTL_REPLY,
|
||||
DCTL_COPYIN,
|
||||
DCTL_COPYINSTR,
|
||||
DCTL_COPYOUT,
|
||||
DCTL_FD_READ,
|
||||
DCTL_FD_WRITE,
|
||||
DCTL_GEN_REPLY /* generic reply */
|
||||
};
|
||||
|
||||
/* On-the-wire message */
|
||||
typedef struct dctl_cmd {
|
||||
uint64_t dcmd_magic;
|
||||
int8_t dcmd_version;
|
||||
int8_t dcmd_msg;
|
||||
uint8_t dcmd_pad[6];
|
||||
union {
|
||||
struct dcmd_ioctl {
|
||||
uint64_t arg;
|
||||
int32_t cmd;
|
||||
uint8_t pad[4];
|
||||
} dcmd_ioctl;
|
||||
|
||||
struct dcmd_copy_req {
|
||||
uint64_t ptr;
|
||||
uint64_t size;
|
||||
} dcmd_copy;
|
||||
|
||||
struct dcmd_fd_req {
|
||||
int64_t size;
|
||||
int32_t fd;
|
||||
uint8_t pad[4];
|
||||
} dcmd_fd_io;
|
||||
|
||||
struct dcmd_reply {
|
||||
uint64_t size; /* used by reply to DCTL_COPYINSTR,
|
||||
DCTL_FD_READ and DCTL_FD_WRITE */
|
||||
int32_t rc; /* return code */
|
||||
uint8_t pad[4];
|
||||
} dcmd_reply;
|
||||
} u;
|
||||
} dctl_cmd_t;
|
||||
|
||||
#define DCTL_CMD_HEADER_SIZE (sizeof(uint64_t) + sizeof(uint8_t))
|
||||
|
||||
/*
|
||||
* The following definitions are only used by the server code.
|
||||
*/
|
||||
|
||||
#define LISTEN_BACKLOG 5
|
||||
|
||||
/* Worker thread data */
|
||||
typedef struct wthr_info {
|
||||
list_node_t wthr_node;
|
||||
pthread_t wthr_id;
|
||||
boolean_t wthr_exit; /* termination flag */
|
||||
boolean_t wthr_free;
|
||||
} wthr_info_t;
|
||||
|
||||
/* Control socket data */
|
||||
typedef struct dctl_sock_info {
|
||||
pthread_mutex_t dsi_mtx;
|
||||
char *dsi_path;
|
||||
struct sockaddr_un dsi_addr;
|
||||
int dsi_fd;
|
||||
} dctl_sock_info_t;
|
||||
|
||||
typedef void *thr_func_t(void *);
|
||||
|
||||
/* Thread pool data */
|
||||
typedef struct dctl_thr_info {
|
||||
thr_func_t *dti_thr_func;
|
||||
|
||||
pthread_mutex_t dti_mtx; /* protects the thread lists and dti_free */
|
||||
list_t dti_list; /* list of threads in the thread pool */
|
||||
list_t dti_join_list; /* list of threads that are waiting to be
|
||||
joined */
|
||||
int dti_free; /* number of free worker threads */
|
||||
|
||||
int dti_min;
|
||||
int dti_max_free;
|
||||
|
||||
boolean_t dti_exit; /* global termination flag */
|
||||
} dctl_thr_info_t;
|
||||
|
||||
/* Messaging functions functions */
|
||||
int dctl_read_msg(int fd, dctl_cmd_t *cmd);
|
||||
int dctl_send_msg(int fd, dctl_cmd_t *cmd);
|
||||
|
||||
int dctl_read_data(int fd, void *ptr, size_t size);
|
||||
int dctl_send_data(int fd, const void *ptr, size_t size);
|
||||
|
||||
/* Thread pool functions */
|
||||
int dctl_thr_pool_create(int min_thr, int max_free_thr,
|
||||
thr_func_t *thr_func);
|
||||
void dctl_thr_pool_stop();
|
||||
|
||||
void dctl_thr_join();
|
||||
void dctl_thr_die(wthr_info_t *thr);
|
||||
void dctl_thr_rebalance(wthr_info_t *thr, boolean_t set_free);
|
||||
|
||||
#endif
|
|
@ -0,0 +1,249 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "@(#)rrwlock.c 1.1 07/10/24 SMI"
|
||||
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/rrwlock.h>
|
||||
|
||||
/*
|
||||
* This file contains the implementation of a re-entrant read
|
||||
* reader/writer lock (aka "rrwlock").
|
||||
*
|
||||
* This is a normal reader/writer lock with the additional feature
|
||||
* of allowing threads who have already obtained a read lock to
|
||||
* re-enter another read lock (re-entrant read) - even if there are
|
||||
* waiting writers.
|
||||
*
|
||||
* Callers who have not obtained a read lock give waiting writers priority.
|
||||
*
|
||||
* The rrwlock_t lock does not allow re-entrant writers, nor does it
|
||||
* allow a re-entrant mix of reads and writes (that is, it does not
|
||||
* allow a caller who has already obtained a read lock to be able to
|
||||
* then grab a write lock without first dropping all read locks, and
|
||||
* vice versa).
|
||||
*
|
||||
* The rrwlock_t uses tsd (thread specific data) to keep a list of
|
||||
* nodes (rrw_node_t), where each node keeps track of which specific
|
||||
* lock (rrw_node_t::rn_rrl) the thread has grabbed. Since re-entering
|
||||
* should be rare, a thread that grabs multiple reads on the same rrwlock_t
|
||||
* will store multiple rrw_node_ts of the same 'rrn_rrl'. Nodes on the
|
||||
* tsd list can represent a different rrwlock_t. This allows a thread
|
||||
* to enter multiple and unique rrwlock_ts for read locks at the same time.
|
||||
*
|
||||
* Since using tsd exposes some overhead, the rrwlock_t only needs to
|
||||
* keep tsd data when writers are waiting. If no writers are waiting, then
|
||||
* a reader just bumps the anonymous read count (rr_anon_rcount) - no tsd
|
||||
* is needed. Once a writer attempts to grab the lock, readers then
|
||||
* keep tsd data and bump the linked readers count (rr_linked_rcount).
|
||||
*
|
||||
* If there are waiting writers and there are anonymous readers, then a
|
||||
* reader doesn't know if it is a re-entrant lock. But since it may be one,
|
||||
* we allow the read to proceed (otherwise it could deadlock). Since once
|
||||
* waiting writers are active, readers no longer bump the anonymous count,
|
||||
* the anonymous readers will eventually flush themselves out. At this point,
|
||||
* readers will be able to tell if they are a re-entrant lock (have a
|
||||
* rrw_node_t entry for the lock) or not. If they are a re-entrant lock, then
|
||||
* we must let the proceed. If they are not, then the reader blocks for the
|
||||
* waiting writers. Hence, we do not starve writers.
|
||||
*/
|
||||
|
||||
/* global key for TSD */
|
||||
uint_t rrw_tsd_key;
|
||||
|
||||
typedef struct rrw_node {
|
||||
struct rrw_node *rn_next;
|
||||
rrwlock_t *rn_rrl;
|
||||
} rrw_node_t;
|
||||
|
||||
static rrw_node_t *
|
||||
rrn_find(rrwlock_t *rrl)
|
||||
{
|
||||
rrw_node_t *rn;
|
||||
|
||||
if (refcount_count(&rrl->rr_linked_rcount) == 0)
|
||||
return (NULL);
|
||||
|
||||
for (rn = tsd_get(rrw_tsd_key); rn != NULL; rn = rn->rn_next) {
|
||||
if (rn->rn_rrl == rrl)
|
||||
return (rn);
|
||||
}
|
||||
return (NULL);
|
||||
}
|
||||
|
||||
/*
|
||||
* Add a node to the head of the singly linked list.
|
||||
*/
|
||||
static void
|
||||
rrn_add(rrwlock_t *rrl)
|
||||
{
|
||||
rrw_node_t *rn;
|
||||
|
||||
rn = kmem_alloc(sizeof (*rn), KM_SLEEP);
|
||||
rn->rn_rrl = rrl;
|
||||
rn->rn_next = tsd_get(rrw_tsd_key);
|
||||
VERIFY(tsd_set(rrw_tsd_key, rn) == 0);
|
||||
}
|
||||
|
||||
/*
|
||||
* If a node is found for 'rrl', then remove the node from this
|
||||
* thread's list and return TRUE; otherwise return FALSE.
|
||||
*/
|
||||
static boolean_t
|
||||
rrn_find_and_remove(rrwlock_t *rrl)
|
||||
{
|
||||
rrw_node_t *rn;
|
||||
rrw_node_t *prev = NULL;
|
||||
|
||||
if (refcount_count(&rrl->rr_linked_rcount) == 0)
|
||||
return (NULL);
|
||||
|
||||
for (rn = tsd_get(rrw_tsd_key); rn != NULL; rn = rn->rn_next) {
|
||||
if (rn->rn_rrl == rrl) {
|
||||
if (prev)
|
||||
prev->rn_next = rn->rn_next;
|
||||
else
|
||||
VERIFY(tsd_set(rrw_tsd_key, rn->rn_next) == 0);
|
||||
kmem_free(rn, sizeof (*rn));
|
||||
return (B_TRUE);
|
||||
}
|
||||
prev = rn;
|
||||
}
|
||||
return (B_FALSE);
|
||||
}
|
||||
|
||||
void
|
||||
rrw_init(rrwlock_t *rrl)
|
||||
{
|
||||
mutex_init(&rrl->rr_lock, NULL, MUTEX_DEFAULT, NULL);
|
||||
cv_init(&rrl->rr_cv, NULL, CV_DEFAULT, NULL);
|
||||
rrl->rr_writer = NULL;
|
||||
refcount_create(&rrl->rr_anon_rcount);
|
||||
refcount_create(&rrl->rr_linked_rcount);
|
||||
rrl->rr_writer_wanted = B_FALSE;
|
||||
}
|
||||
|
||||
void
|
||||
rrw_destroy(rrwlock_t *rrl)
|
||||
{
|
||||
mutex_destroy(&rrl->rr_lock);
|
||||
cv_destroy(&rrl->rr_cv);
|
||||
ASSERT(rrl->rr_writer == NULL);
|
||||
refcount_destroy(&rrl->rr_anon_rcount);
|
||||
refcount_destroy(&rrl->rr_linked_rcount);
|
||||
}
|
||||
|
||||
static void
|
||||
rrw_enter_read(rrwlock_t *rrl, void *tag)
|
||||
{
|
||||
mutex_enter(&rrl->rr_lock);
|
||||
ASSERT(rrl->rr_writer != curthread);
|
||||
ASSERT(refcount_count(&rrl->rr_anon_rcount) >= 0);
|
||||
|
||||
while (rrl->rr_writer || (rrl->rr_writer_wanted &&
|
||||
refcount_is_zero(&rrl->rr_anon_rcount) &&
|
||||
rrn_find(rrl) == NULL))
|
||||
cv_wait(&rrl->rr_cv, &rrl->rr_lock);
|
||||
|
||||
if (rrl->rr_writer_wanted) {
|
||||
/* may or may not be a re-entrant enter */
|
||||
rrn_add(rrl);
|
||||
(void) refcount_add(&rrl->rr_linked_rcount, tag);
|
||||
} else {
|
||||
(void) refcount_add(&rrl->rr_anon_rcount, tag);
|
||||
}
|
||||
ASSERT(rrl->rr_writer == NULL);
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
}
|
||||
|
||||
static void
|
||||
rrw_enter_write(rrwlock_t *rrl)
|
||||
{
|
||||
mutex_enter(&rrl->rr_lock);
|
||||
ASSERT(rrl->rr_writer != curthread);
|
||||
|
||||
while (refcount_count(&rrl->rr_anon_rcount) > 0 ||
|
||||
refcount_count(&rrl->rr_linked_rcount) > 0 ||
|
||||
rrl->rr_writer != NULL) {
|
||||
rrl->rr_writer_wanted = B_TRUE;
|
||||
cv_wait(&rrl->rr_cv, &rrl->rr_lock);
|
||||
}
|
||||
rrl->rr_writer_wanted = B_FALSE;
|
||||
rrl->rr_writer = curthread;
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
}
|
||||
|
||||
void
|
||||
rrw_enter(rrwlock_t *rrl, krw_t rw, void *tag)
|
||||
{
|
||||
if (rw == RW_READER)
|
||||
rrw_enter_read(rrl, tag);
|
||||
else
|
||||
rrw_enter_write(rrl);
|
||||
}
|
||||
|
||||
void
|
||||
rrw_exit(rrwlock_t *rrl, void *tag)
|
||||
{
|
||||
mutex_enter(&rrl->rr_lock);
|
||||
ASSERT(!refcount_is_zero(&rrl->rr_anon_rcount) ||
|
||||
!refcount_is_zero(&rrl->rr_linked_rcount) ||
|
||||
rrl->rr_writer != NULL);
|
||||
|
||||
if (rrl->rr_writer == NULL) {
|
||||
if (rrn_find_and_remove(rrl)) {
|
||||
if (refcount_remove(&rrl->rr_linked_rcount, tag) == 0)
|
||||
cv_broadcast(&rrl->rr_cv);
|
||||
|
||||
} else {
|
||||
if (refcount_remove(&rrl->rr_anon_rcount, tag) == 0)
|
||||
cv_broadcast(&rrl->rr_cv);
|
||||
}
|
||||
} else {
|
||||
ASSERT(rrl->rr_writer == curthread);
|
||||
ASSERT(refcount_is_zero(&rrl->rr_anon_rcount) &&
|
||||
refcount_is_zero(&rrl->rr_linked_rcount));
|
||||
rrl->rr_writer = NULL;
|
||||
cv_broadcast(&rrl->rr_cv);
|
||||
}
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
}
|
||||
|
||||
boolean_t
|
||||
rrw_held(rrwlock_t *rrl, krw_t rw)
|
||||
{
|
||||
boolean_t held;
|
||||
|
||||
mutex_enter(&rrl->rr_lock);
|
||||
if (rw == RW_WRITER) {
|
||||
held = (rrl->rr_writer == curthread);
|
||||
} else {
|
||||
held = (!refcount_is_zero(&rrl->rr_anon_rcount) ||
|
||||
!refcount_is_zero(&rrl->rr_linked_rcount));
|
||||
}
|
||||
mutex_exit(&rrl->rr_lock);
|
||||
|
||||
return (held);
|
||||
}
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,968 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "@(#)zfs_dir.c 1.25 08/04/27 SMI"
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/param.h>
|
||||
#include <sys/time.h>
|
||||
#include <sys/systm.h>
|
||||
#include <sys/sysmacros.h>
|
||||
#include <sys/resource.h>
|
||||
#include <sys/vfs.h>
|
||||
#include <sys/vnode.h>
|
||||
#include <sys/file.h>
|
||||
#include <sys/mode.h>
|
||||
#include <sys/kmem.h>
|
||||
#include <sys/uio.h>
|
||||
#include <sys/pathname.h>
|
||||
#include <sys/cmn_err.h>
|
||||
#include <sys/errno.h>
|
||||
#include <sys/stat.h>
|
||||
#include <sys/unistd.h>
|
||||
#include <sys/sunddi.h>
|
||||
#include <sys/random.h>
|
||||
#include <sys/policy.h>
|
||||
#include <sys/zfs_dir.h>
|
||||
#include <sys/zfs_acl.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
#include "fs/fs_subr.h"
|
||||
#include <sys/zap.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/atomic.h>
|
||||
#include <sys/zfs_ctldir.h>
|
||||
#include <sys/zfs_fuid.h>
|
||||
#include <sys/dnlc.h>
|
||||
#include <sys/extdirent.h>
|
||||
|
||||
/*
|
||||
* zfs_match_find() is used by zfs_dirent_lock() to peform zap lookups
|
||||
* of names after deciding which is the appropriate lookup interface.
|
||||
*/
|
||||
static int
|
||||
zfs_match_find(zfsvfs_t *zfsvfs, znode_t *dzp, char *name, boolean_t exact,
|
||||
boolean_t update, int *deflags, pathname_t *rpnp, uint64_t *zoid)
|
||||
{
|
||||
int error;
|
||||
|
||||
if (zfsvfs->z_norm) {
|
||||
matchtype_t mt = MT_FIRST;
|
||||
boolean_t conflict = B_FALSE;
|
||||
size_t bufsz = 0;
|
||||
char *buf = NULL;
|
||||
|
||||
if (rpnp) {
|
||||
buf = rpnp->pn_buf;
|
||||
bufsz = rpnp->pn_bufsize;
|
||||
}
|
||||
if (exact)
|
||||
mt = MT_EXACT;
|
||||
/*
|
||||
* In the non-mixed case we only expect there would ever
|
||||
* be one match, but we need to use the normalizing lookup.
|
||||
*/
|
||||
error = zap_lookup_norm(zfsvfs->z_os, dzp->z_id, name, 8, 1,
|
||||
zoid, mt, buf, bufsz, &conflict);
|
||||
if (!error && deflags)
|
||||
*deflags = conflict ? ED_CASE_CONFLICT : 0;
|
||||
} else {
|
||||
error = zap_lookup(zfsvfs->z_os, dzp->z_id, name, 8, 1, zoid);
|
||||
}
|
||||
*zoid = ZFS_DIRENT_OBJ(*zoid);
|
||||
|
||||
if (error == ENOENT && update)
|
||||
dnlc_update(ZTOV(dzp), name, DNLC_NO_VNODE);
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
/*
|
||||
* Lock a directory entry. A dirlock on <dzp, name> protects that name
|
||||
* in dzp's directory zap object. As long as you hold a dirlock, you can
|
||||
* assume two things: (1) dzp cannot be reaped, and (2) no other thread
|
||||
* can change the zap entry for (i.e. link or unlink) this name.
|
||||
*
|
||||
* Input arguments:
|
||||
* dzp - znode for directory
|
||||
* name - name of entry to lock
|
||||
* flag - ZNEW: if the entry already exists, fail with EEXIST.
|
||||
* ZEXISTS: if the entry does not exist, fail with ENOENT.
|
||||
* ZSHARED: allow concurrent access with other ZSHARED callers.
|
||||
* ZXATTR: we want dzp's xattr directory
|
||||
* ZCILOOK: On a mixed sensitivity file system,
|
||||
* this lookup should be case-insensitive.
|
||||
* ZCIEXACT: On a purely case-insensitive file system,
|
||||
* this lookup should be case-sensitive.
|
||||
* ZRENAMING: we are locking for renaming, force narrow locks
|
||||
*
|
||||
* Output arguments:
|
||||
* zpp - pointer to the znode for the entry (NULL if there isn't one)
|
||||
* dlpp - pointer to the dirlock for this entry (NULL on error)
|
||||
* direntflags - (case-insensitive lookup only)
|
||||
* flags if multiple case-sensitive matches exist in directory
|
||||
* realpnp - (case-insensitive lookup only)
|
||||
* actual name matched within the directory
|
||||
*
|
||||
* Return value: 0 on success or errno on failure.
|
||||
*
|
||||
* NOTE: Always checks for, and rejects, '.' and '..'.
|
||||
* NOTE: For case-insensitive file systems we take wide locks (see below),
|
||||
* but return znode pointers to a single match.
|
||||
*/
|
||||
int
|
||||
zfs_dirent_lock(zfs_dirlock_t **dlpp, znode_t *dzp, char *name, znode_t **zpp,
|
||||
int flag, int *direntflags, pathname_t *realpnp)
|
||||
{
|
||||
zfsvfs_t *zfsvfs = dzp->z_zfsvfs;
|
||||
zfs_dirlock_t *dl;
|
||||
boolean_t update;
|
||||
boolean_t exact;
|
||||
uint64_t zoid;
|
||||
vnode_t *vp = NULL;
|
||||
int error = 0;
|
||||
int cmpflags;
|
||||
|
||||
*zpp = NULL;
|
||||
*dlpp = NULL;
|
||||
|
||||
/*
|
||||
* Verify that we are not trying to lock '.', '..', or '.zfs'
|
||||
*/
|
||||
if (name[0] == '.' &&
|
||||
(name[1] == '\0' || (name[1] == '.' && name[2] == '\0')) ||
|
||||
zfs_has_ctldir(dzp) && strcmp(name, ZFS_CTLDIR_NAME) == 0)
|
||||
return (EEXIST);
|
||||
|
||||
/*
|
||||
* Case sensitivity and normalization preferences are set when
|
||||
* the file system is created. These are stored in the
|
||||
* zfsvfs->z_case and zfsvfs->z_norm fields. These choices
|
||||
* affect what vnodes can be cached in the DNLC, how we
|
||||
* perform zap lookups, and the "width" of our dirlocks.
|
||||
*
|
||||
* A normal dirlock locks a single name. Note that with
|
||||
* normalization a name can be composed multiple ways, but
|
||||
* when normalized, these names all compare equal. A wide
|
||||
* dirlock locks multiple names. We need these when the file
|
||||
* system is supporting mixed-mode access. It is sometimes
|
||||
* necessary to lock all case permutations of file name at
|
||||
* once so that simultaneous case-insensitive/case-sensitive
|
||||
* behaves as rationally as possible.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Decide if exact matches should be requested when performing
|
||||
* a zap lookup on file systems supporting case-insensitive
|
||||
* access.
|
||||
*/
|
||||
exact =
|
||||
((zfsvfs->z_case == ZFS_CASE_INSENSITIVE) && (flag & ZCIEXACT)) ||
|
||||
((zfsvfs->z_case == ZFS_CASE_MIXED) && !(flag & ZCILOOK));
|
||||
|
||||
/*
|
||||
* Only look in or update the DNLC if we are looking for the
|
||||
* name on a file system that does not require normalization
|
||||
* or case folding. We can also look there if we happen to be
|
||||
* on a non-normalizing, mixed sensitivity file system IF we
|
||||
* are looking for the exact name.
|
||||
*
|
||||
* Maybe can add TO-UPPERed version of name to dnlc in ci-only
|
||||
* case for performance improvement?
|
||||
*/
|
||||
update = !zfsvfs->z_norm ||
|
||||
((zfsvfs->z_case == ZFS_CASE_MIXED) &&
|
||||
!(zfsvfs->z_norm & ~U8_TEXTPREP_TOUPPER) && !(flag & ZCILOOK));
|
||||
|
||||
/*
|
||||
* ZRENAMING indicates we are in a situation where we should
|
||||
* take narrow locks regardless of the file system's
|
||||
* preferences for normalizing and case folding. This will
|
||||
* prevent us deadlocking trying to grab the same wide lock
|
||||
* twice if the two names happen to be case-insensitive
|
||||
* matches.
|
||||
*/
|
||||
if (flag & ZRENAMING)
|
||||
cmpflags = 0;
|
||||
else
|
||||
cmpflags = zfsvfs->z_norm;
|
||||
|
||||
/*
|
||||
* Wait until there are no locks on this name.
|
||||
*/
|
||||
rw_enter(&dzp->z_name_lock, RW_READER);
|
||||
mutex_enter(&dzp->z_lock);
|
||||
for (;;) {
|
||||
if (dzp->z_unlinked) {
|
||||
mutex_exit(&dzp->z_lock);
|
||||
rw_exit(&dzp->z_name_lock);
|
||||
return (ENOENT);
|
||||
}
|
||||
for (dl = dzp->z_dirlocks; dl != NULL; dl = dl->dl_next) {
|
||||
if ((u8_strcmp(name, dl->dl_name, 0, cmpflags,
|
||||
U8_UNICODE_LATEST, &error) == 0) || error != 0)
|
||||
break;
|
||||
}
|
||||
if (error != 0) {
|
||||
mutex_exit(&dzp->z_lock);
|
||||
rw_exit(&dzp->z_name_lock);
|
||||
return (ENOENT);
|
||||
}
|
||||
if (dl == NULL) {
|
||||
/*
|
||||
* Allocate a new dirlock and add it to the list.
|
||||
*/
|
||||
dl = kmem_alloc(sizeof (zfs_dirlock_t), KM_SLEEP);
|
||||
cv_init(&dl->dl_cv, NULL, CV_DEFAULT, NULL);
|
||||
dl->dl_name = name;
|
||||
dl->dl_sharecnt = 0;
|
||||
dl->dl_namesize = 0;
|
||||
dl->dl_dzp = dzp;
|
||||
dl->dl_next = dzp->z_dirlocks;
|
||||
dzp->z_dirlocks = dl;
|
||||
break;
|
||||
}
|
||||
if ((flag & ZSHARED) && dl->dl_sharecnt != 0)
|
||||
break;
|
||||
cv_wait(&dl->dl_cv, &dzp->z_lock);
|
||||
}
|
||||
|
||||
if ((flag & ZSHARED) && ++dl->dl_sharecnt > 1 && dl->dl_namesize == 0) {
|
||||
/*
|
||||
* We're the second shared reference to dl. Make a copy of
|
||||
* dl_name in case the first thread goes away before we do.
|
||||
* Note that we initialize the new name before storing its
|
||||
* pointer into dl_name, because the first thread may load
|
||||
* dl->dl_name at any time. He'll either see the old value,
|
||||
* which is his, or the new shared copy; either is OK.
|
||||
*/
|
||||
dl->dl_namesize = strlen(dl->dl_name) + 1;
|
||||
name = kmem_alloc(dl->dl_namesize, KM_SLEEP);
|
||||
bcopy(dl->dl_name, name, dl->dl_namesize);
|
||||
dl->dl_name = name;
|
||||
}
|
||||
|
||||
mutex_exit(&dzp->z_lock);
|
||||
|
||||
/*
|
||||
* We have a dirlock on the name. (Note that it is the dirlock,
|
||||
* not the dzp's z_lock, that protects the name in the zap object.)
|
||||
* See if there's an object by this name; if so, put a hold on it.
|
||||
*/
|
||||
if (flag & ZXATTR) {
|
||||
zoid = dzp->z_phys->zp_xattr;
|
||||
error = (zoid == 0 ? ENOENT : 0);
|
||||
} else {
|
||||
if (update)
|
||||
vp = dnlc_lookup(ZTOV(dzp), name);
|
||||
if (vp == DNLC_NO_VNODE) {
|
||||
VN_RELE(vp);
|
||||
error = ENOENT;
|
||||
} else if (vp) {
|
||||
if (flag & ZNEW) {
|
||||
zfs_dirent_unlock(dl);
|
||||
VN_RELE(vp);
|
||||
return (EEXIST);
|
||||
}
|
||||
*dlpp = dl;
|
||||
*zpp = VTOZ(vp);
|
||||
return (0);
|
||||
} else {
|
||||
error = zfs_match_find(zfsvfs, dzp, name, exact,
|
||||
update, direntflags, realpnp, &zoid);
|
||||
}
|
||||
}
|
||||
if (error) {
|
||||
if (error != ENOENT || (flag & ZEXISTS)) {
|
||||
zfs_dirent_unlock(dl);
|
||||
return (error);
|
||||
}
|
||||
} else {
|
||||
if (flag & ZNEW) {
|
||||
zfs_dirent_unlock(dl);
|
||||
return (EEXIST);
|
||||
}
|
||||
error = zfs_zget(zfsvfs, zoid, zpp);
|
||||
if (error) {
|
||||
zfs_dirent_unlock(dl);
|
||||
return (error);
|
||||
}
|
||||
if (!(flag & ZXATTR) && update)
|
||||
dnlc_update(ZTOV(dzp), name, ZTOV(*zpp));
|
||||
}
|
||||
|
||||
*dlpp = dl;
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Unlock this directory entry and wake anyone who was waiting for it.
|
||||
*/
|
||||
void
|
||||
zfs_dirent_unlock(zfs_dirlock_t *dl)
|
||||
{
|
||||
znode_t *dzp = dl->dl_dzp;
|
||||
zfs_dirlock_t **prev_dl, *cur_dl;
|
||||
|
||||
mutex_enter(&dzp->z_lock);
|
||||
rw_exit(&dzp->z_name_lock);
|
||||
if (dl->dl_sharecnt > 1) {
|
||||
dl->dl_sharecnt--;
|
||||
mutex_exit(&dzp->z_lock);
|
||||
return;
|
||||
}
|
||||
prev_dl = &dzp->z_dirlocks;
|
||||
while ((cur_dl = *prev_dl) != dl)
|
||||
prev_dl = &cur_dl->dl_next;
|
||||
*prev_dl = dl->dl_next;
|
||||
cv_broadcast(&dl->dl_cv);
|
||||
mutex_exit(&dzp->z_lock);
|
||||
|
||||
if (dl->dl_namesize != 0)
|
||||
kmem_free(dl->dl_name, dl->dl_namesize);
|
||||
cv_destroy(&dl->dl_cv);
|
||||
kmem_free(dl, sizeof (*dl));
|
||||
}
|
||||
|
||||
/*
|
||||
* Look up an entry in a directory.
|
||||
*
|
||||
* NOTE: '.' and '..' are handled as special cases because
|
||||
* no directory entries are actually stored for them. If this is
|
||||
* the root of a filesystem, then '.zfs' is also treated as a
|
||||
* special pseudo-directory.
|
||||
*/
|
||||
int
|
||||
zfs_dirlook(znode_t *dzp, char *name, vnode_t **vpp, int flags,
|
||||
int *deflg, pathname_t *rpnp)
|
||||
{
|
||||
zfs_dirlock_t *dl;
|
||||
znode_t *zp;
|
||||
int error = 0;
|
||||
|
||||
if (name[0] == 0 || (name[0] == '.' && name[1] == 0)) {
|
||||
*vpp = ZTOV(dzp);
|
||||
VN_HOLD(*vpp);
|
||||
} else if (name[0] == '.' && name[1] == '.' && name[2] == 0) {
|
||||
zfsvfs_t *zfsvfs = dzp->z_zfsvfs;
|
||||
/*
|
||||
* If we are a snapshot mounted under .zfs, return
|
||||
* the vp for the snapshot directory.
|
||||
*/
|
||||
if (dzp->z_phys->zp_parent == dzp->z_id &&
|
||||
zfsvfs->z_parent != zfsvfs) {
|
||||
error = zfsctl_root_lookup(zfsvfs->z_parent->z_ctldir,
|
||||
"snapshot", vpp, NULL, 0, NULL, kcred,
|
||||
NULL, NULL, NULL);
|
||||
return (error);
|
||||
}
|
||||
rw_enter(&dzp->z_parent_lock, RW_READER);
|
||||
error = zfs_zget(zfsvfs, dzp->z_phys->zp_parent, &zp);
|
||||
if (error == 0)
|
||||
*vpp = ZTOV(zp);
|
||||
rw_exit(&dzp->z_parent_lock);
|
||||
} else if (zfs_has_ctldir(dzp) && strcmp(name, ZFS_CTLDIR_NAME) == 0) {
|
||||
*vpp = zfsctl_root(dzp);
|
||||
} else {
|
||||
int zf;
|
||||
|
||||
zf = ZEXISTS | ZSHARED;
|
||||
if (flags & FIGNORECASE)
|
||||
zf |= ZCILOOK;
|
||||
|
||||
error = zfs_dirent_lock(&dl, dzp, name, &zp, zf, deflg, rpnp);
|
||||
if (error == 0) {
|
||||
*vpp = ZTOV(zp);
|
||||
zfs_dirent_unlock(dl);
|
||||
dzp->z_zn_prefetch = B_TRUE; /* enable prefetching */
|
||||
}
|
||||
rpnp = NULL;
|
||||
}
|
||||
|
||||
if ((flags & FIGNORECASE) && rpnp && !error)
|
||||
(void) strlcpy(rpnp->pn_buf, name, rpnp->pn_bufsize);
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
static char *
|
||||
zfs_unlinked_hexname(char namebuf[17], uint64_t x)
|
||||
{
|
||||
char *name = &namebuf[16];
|
||||
const char digits[16] = "0123456789abcdef";
|
||||
|
||||
*name = '\0';
|
||||
do {
|
||||
*--name = digits[x & 0xf];
|
||||
x >>= 4;
|
||||
} while (x != 0);
|
||||
|
||||
return (name);
|
||||
}
|
||||
|
||||
/*
|
||||
* unlinked Set (formerly known as the "delete queue") Error Handling
|
||||
*
|
||||
* When dealing with the unlinked set, we dmu_tx_hold_zap(), but we
|
||||
* don't specify the name of the entry that we will be manipulating. We
|
||||
* also fib and say that we won't be adding any new entries to the
|
||||
* unlinked set, even though we might (this is to lower the minimum file
|
||||
* size that can be deleted in a full filesystem). So on the small
|
||||
* chance that the nlink list is using a fat zap (ie. has more than
|
||||
* 2000 entries), we *may* not pre-read a block that's needed.
|
||||
* Therefore it is remotely possible for some of the assertions
|
||||
* regarding the unlinked set below to fail due to i/o error. On a
|
||||
* nondebug system, this will result in the space being leaked.
|
||||
*/
|
||||
void
|
||||
zfs_unlinked_add(znode_t *zp, dmu_tx_t *tx)
|
||||
{
|
||||
zfsvfs_t *zfsvfs = zp->z_zfsvfs;
|
||||
char obj_name[17];
|
||||
int error;
|
||||
|
||||
ASSERT(zp->z_unlinked);
|
||||
ASSERT3U(zp->z_phys->zp_links, ==, 0);
|
||||
|
||||
error = zap_add(zfsvfs->z_os, zfsvfs->z_unlinkedobj,
|
||||
zfs_unlinked_hexname(obj_name, zp->z_id), 8, 1, &zp->z_id, tx);
|
||||
ASSERT3U(error, ==, 0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Clean up any znodes that had no links when we either crashed or
|
||||
* (force) umounted the file system.
|
||||
*/
|
||||
void
|
||||
zfs_unlinked_drain(zfsvfs_t *zfsvfs)
|
||||
{
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t zap;
|
||||
dmu_object_info_t doi;
|
||||
znode_t *zp;
|
||||
int error;
|
||||
|
||||
/*
|
||||
* Interate over the contents of the unlinked set.
|
||||
*/
|
||||
for (zap_cursor_init(&zc, zfsvfs->z_os, zfsvfs->z_unlinkedobj);
|
||||
zap_cursor_retrieve(&zc, &zap) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
|
||||
/*
|
||||
* See what kind of object we have in list
|
||||
*/
|
||||
|
||||
error = dmu_object_info(zfsvfs->z_os,
|
||||
zap.za_first_integer, &doi);
|
||||
if (error != 0)
|
||||
continue;
|
||||
|
||||
ASSERT((doi.doi_type == DMU_OT_PLAIN_FILE_CONTENTS) ||
|
||||
(doi.doi_type == DMU_OT_DIRECTORY_CONTENTS));
|
||||
/*
|
||||
* We need to re-mark these list entries for deletion,
|
||||
* so we pull them back into core and set zp->z_unlinked.
|
||||
*/
|
||||
error = zfs_zget(zfsvfs, zap.za_first_integer, &zp);
|
||||
|
||||
/*
|
||||
* We may pick up znodes that are already marked for deletion.
|
||||
* This could happen during the purge of an extended attribute
|
||||
* directory. All we need to do is skip over them, since they
|
||||
* are already in the system marked z_unlinked.
|
||||
*/
|
||||
if (error != 0)
|
||||
continue;
|
||||
|
||||
zp->z_unlinked = B_TRUE;
|
||||
VN_RELE(ZTOV(zp));
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
}
|
||||
|
||||
/*
|
||||
* Delete the entire contents of a directory. Return a count
|
||||
* of the number of entries that could not be deleted. If we encounter
|
||||
* an error, return a count of at least one so that the directory stays
|
||||
* in the unlinked set.
|
||||
*
|
||||
* NOTE: this function assumes that the directory is inactive,
|
||||
* so there is no need to lock its entries before deletion.
|
||||
* Also, it assumes the directory contents is *only* regular
|
||||
* files.
|
||||
*/
|
||||
static int
|
||||
zfs_purgedir(znode_t *dzp)
|
||||
{
|
||||
zap_cursor_t zc;
|
||||
zap_attribute_t zap;
|
||||
znode_t *xzp;
|
||||
dmu_tx_t *tx;
|
||||
zfsvfs_t *zfsvfs = dzp->z_zfsvfs;
|
||||
zfs_dirlock_t dl;
|
||||
int skipped = 0;
|
||||
int error;
|
||||
|
||||
for (zap_cursor_init(&zc, zfsvfs->z_os, dzp->z_id);
|
||||
(error = zap_cursor_retrieve(&zc, &zap)) == 0;
|
||||
zap_cursor_advance(&zc)) {
|
||||
error = zfs_zget(zfsvfs,
|
||||
ZFS_DIRENT_OBJ(zap.za_first_integer), &xzp);
|
||||
if (error) {
|
||||
skipped += 1;
|
||||
continue;
|
||||
}
|
||||
|
||||
ASSERT((ZTOV(xzp)->v_type == VREG) ||
|
||||
(ZTOV(xzp)->v_type == VLNK));
|
||||
|
||||
tx = dmu_tx_create(zfsvfs->z_os);
|
||||
dmu_tx_hold_bonus(tx, dzp->z_id);
|
||||
dmu_tx_hold_zap(tx, dzp->z_id, FALSE, zap.za_name);
|
||||
dmu_tx_hold_bonus(tx, xzp->z_id);
|
||||
dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL);
|
||||
error = dmu_tx_assign(tx, TXG_WAIT);
|
||||
if (error) {
|
||||
dmu_tx_abort(tx);
|
||||
VN_RELE(ZTOV(xzp));
|
||||
skipped += 1;
|
||||
continue;
|
||||
}
|
||||
bzero(&dl, sizeof (dl));
|
||||
dl.dl_dzp = dzp;
|
||||
dl.dl_name = zap.za_name;
|
||||
|
||||
error = zfs_link_destroy(&dl, xzp, tx, 0, NULL);
|
||||
if (error)
|
||||
skipped += 1;
|
||||
dmu_tx_commit(tx);
|
||||
|
||||
VN_RELE(ZTOV(xzp));
|
||||
}
|
||||
zap_cursor_fini(&zc);
|
||||
if (error != ENOENT)
|
||||
skipped += 1;
|
||||
return (skipped);
|
||||
}
|
||||
|
||||
void
|
||||
zfs_rmnode(znode_t *zp)
|
||||
{
|
||||
zfsvfs_t *zfsvfs = zp->z_zfsvfs;
|
||||
objset_t *os = zfsvfs->z_os;
|
||||
znode_t *xzp = NULL;
|
||||
char obj_name[17];
|
||||
dmu_tx_t *tx;
|
||||
uint64_t acl_obj;
|
||||
int error;
|
||||
|
||||
ASSERT(ZTOV(zp)->v_count == 0);
|
||||
ASSERT(zp->z_phys->zp_links == 0);
|
||||
|
||||
/*
|
||||
* If this is an attribute directory, purge its contents.
|
||||
*/
|
||||
if (ZTOV(zp)->v_type == VDIR && (zp->z_phys->zp_flags & ZFS_XATTR)) {
|
||||
if (zfs_purgedir(zp) != 0) {
|
||||
/*
|
||||
* Not enough space to delete some xattrs.
|
||||
* Leave it on the unlinked set.
|
||||
*/
|
||||
zfs_znode_dmu_fini(zp);
|
||||
zfs_znode_free(zp);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* If the file has extended attributes, we're going to unlink
|
||||
* the xattr dir.
|
||||
*/
|
||||
if (zp->z_phys->zp_xattr) {
|
||||
error = zfs_zget(zfsvfs, zp->z_phys->zp_xattr, &xzp);
|
||||
ASSERT(error == 0);
|
||||
}
|
||||
|
||||
acl_obj = zp->z_phys->zp_acl.z_acl_extern_obj;
|
||||
|
||||
/*
|
||||
* Set up the transaction.
|
||||
*/
|
||||
tx = dmu_tx_create(os);
|
||||
dmu_tx_hold_free(tx, zp->z_id, 0, DMU_OBJECT_END);
|
||||
dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL);
|
||||
if (xzp) {
|
||||
dmu_tx_hold_bonus(tx, xzp->z_id);
|
||||
dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, TRUE, NULL);
|
||||
}
|
||||
if (acl_obj)
|
||||
dmu_tx_hold_free(tx, acl_obj, 0, DMU_OBJECT_END);
|
||||
error = dmu_tx_assign(tx, TXG_WAIT);
|
||||
if (error) {
|
||||
/*
|
||||
* Not enough space to delete the file. Leave it in the
|
||||
* unlinked set, leaking it until the fs is remounted (at
|
||||
* which point we'll call zfs_unlinked_drain() to process it).
|
||||
*/
|
||||
dmu_tx_abort(tx);
|
||||
zfs_znode_dmu_fini(zp);
|
||||
zfs_znode_free(zp);
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (xzp) {
|
||||
dmu_buf_will_dirty(xzp->z_dbuf, tx);
|
||||
mutex_enter(&xzp->z_lock);
|
||||
xzp->z_unlinked = B_TRUE; /* mark xzp for deletion */
|
||||
xzp->z_phys->zp_links = 0; /* no more links to it */
|
||||
mutex_exit(&xzp->z_lock);
|
||||
zfs_unlinked_add(xzp, tx);
|
||||
}
|
||||
|
||||
/* Remove this znode from the unlinked set */
|
||||
error = zap_remove(os, zfsvfs->z_unlinkedobj,
|
||||
zfs_unlinked_hexname(obj_name, zp->z_id), tx);
|
||||
ASSERT3U(error, ==, 0);
|
||||
|
||||
zfs_znode_delete(zp, tx);
|
||||
|
||||
dmu_tx_commit(tx);
|
||||
out:
|
||||
if (xzp)
|
||||
VN_RELE(ZTOV(xzp));
|
||||
}
|
||||
|
||||
static uint64_t
|
||||
zfs_dirent(znode_t *zp)
|
||||
{
|
||||
uint64_t de = zp->z_id;
|
||||
if (zp->z_zfsvfs->z_version >= ZPL_VERSION_DIRENT_TYPE)
|
||||
de |= IFTODT((zp)->z_phys->zp_mode) << 60;
|
||||
return (de);
|
||||
}
|
||||
|
||||
/*
|
||||
* Link zp into dl. Can only fail if zp has been unlinked.
|
||||
*/
|
||||
int
|
||||
zfs_link_create(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag)
|
||||
{
|
||||
znode_t *dzp = dl->dl_dzp;
|
||||
vnode_t *vp = ZTOV(zp);
|
||||
uint64_t value;
|
||||
int zp_is_dir = (vp->v_type == VDIR);
|
||||
int error;
|
||||
|
||||
dmu_buf_will_dirty(zp->z_dbuf, tx);
|
||||
mutex_enter(&zp->z_lock);
|
||||
|
||||
if (!(flag & ZRENAMING)) {
|
||||
if (zp->z_unlinked) { /* no new links to unlinked zp */
|
||||
ASSERT(!(flag & (ZNEW | ZEXISTS)));
|
||||
mutex_exit(&zp->z_lock);
|
||||
return (ENOENT);
|
||||
}
|
||||
zp->z_phys->zp_links++;
|
||||
}
|
||||
zp->z_phys->zp_parent = dzp->z_id; /* dzp is now zp's parent */
|
||||
|
||||
if (!(flag & ZNEW))
|
||||
zfs_time_stamper_locked(zp, STATE_CHANGED, tx);
|
||||
mutex_exit(&zp->z_lock);
|
||||
|
||||
dmu_buf_will_dirty(dzp->z_dbuf, tx);
|
||||
mutex_enter(&dzp->z_lock);
|
||||
dzp->z_phys->zp_size++; /* one dirent added */
|
||||
dzp->z_phys->zp_links += zp_is_dir; /* ".." link from zp */
|
||||
zfs_time_stamper_locked(dzp, CONTENT_MODIFIED, tx);
|
||||
mutex_exit(&dzp->z_lock);
|
||||
|
||||
value = zfs_dirent(zp);
|
||||
error = zap_add(zp->z_zfsvfs->z_os, dzp->z_id, dl->dl_name,
|
||||
8, 1, &value, tx);
|
||||
ASSERT(error == 0);
|
||||
|
||||
dnlc_update(ZTOV(dzp), dl->dl_name, vp);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Unlink zp from dl, and mark zp for deletion if this was the last link.
|
||||
* Can fail if zp is a mount point (EBUSY) or a non-empty directory (EEXIST).
|
||||
* If 'unlinkedp' is NULL, we put unlinked znodes on the unlinked list.
|
||||
* If it's non-NULL, we use it to indicate whether the znode needs deletion,
|
||||
* and it's the caller's job to do it.
|
||||
*/
|
||||
int
|
||||
zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag,
|
||||
boolean_t *unlinkedp)
|
||||
{
|
||||
znode_t *dzp = dl->dl_dzp;
|
||||
vnode_t *vp = ZTOV(zp);
|
||||
int zp_is_dir = (vp->v_type == VDIR);
|
||||
boolean_t unlinked = B_FALSE;
|
||||
int error;
|
||||
|
||||
dnlc_remove(ZTOV(dzp), dl->dl_name);
|
||||
|
||||
if (!(flag & ZRENAMING)) {
|
||||
dmu_buf_will_dirty(zp->z_dbuf, tx);
|
||||
|
||||
if (vn_vfswlock(vp)) /* prevent new mounts on zp */
|
||||
return (EBUSY);
|
||||
|
||||
if (vn_ismntpt(vp)) { /* don't remove mount point */
|
||||
vn_vfsunlock(vp);
|
||||
return (EBUSY);
|
||||
}
|
||||
|
||||
mutex_enter(&zp->z_lock);
|
||||
if (zp_is_dir && !zfs_dirempty(zp)) { /* dir not empty */
|
||||
mutex_exit(&zp->z_lock);
|
||||
vn_vfsunlock(vp);
|
||||
return (EEXIST);
|
||||
}
|
||||
if (zp->z_phys->zp_links <= zp_is_dir) {
|
||||
zfs_panic_recover("zfs: link count on %s is %u, "
|
||||
"should be at least %u",
|
||||
zp->z_vnode->v_path ? zp->z_vnode->v_path :
|
||||
"<unknown>", (int)zp->z_phys->zp_links,
|
||||
zp_is_dir + 1);
|
||||
zp->z_phys->zp_links = zp_is_dir + 1;
|
||||
}
|
||||
if (--zp->z_phys->zp_links == zp_is_dir) {
|
||||
zp->z_unlinked = B_TRUE;
|
||||
zp->z_phys->zp_links = 0;
|
||||
unlinked = B_TRUE;
|
||||
} else {
|
||||
zfs_time_stamper_locked(zp, STATE_CHANGED, tx);
|
||||
}
|
||||
mutex_exit(&zp->z_lock);
|
||||
vn_vfsunlock(vp);
|
||||
}
|
||||
|
||||
dmu_buf_will_dirty(dzp->z_dbuf, tx);
|
||||
mutex_enter(&dzp->z_lock);
|
||||
dzp->z_phys->zp_size--; /* one dirent removed */
|
||||
dzp->z_phys->zp_links -= zp_is_dir; /* ".." link from zp */
|
||||
zfs_time_stamper_locked(dzp, CONTENT_MODIFIED, tx);
|
||||
mutex_exit(&dzp->z_lock);
|
||||
|
||||
if (zp->z_zfsvfs->z_norm) {
|
||||
if (((zp->z_zfsvfs->z_case == ZFS_CASE_INSENSITIVE) &&
|
||||
(flag & ZCIEXACT)) ||
|
||||
((zp->z_zfsvfs->z_case == ZFS_CASE_MIXED) &&
|
||||
!(flag & ZCILOOK)))
|
||||
error = zap_remove_norm(zp->z_zfsvfs->z_os,
|
||||
dzp->z_id, dl->dl_name, MT_EXACT, tx);
|
||||
else
|
||||
error = zap_remove_norm(zp->z_zfsvfs->z_os,
|
||||
dzp->z_id, dl->dl_name, MT_FIRST, tx);
|
||||
} else {
|
||||
error = zap_remove(zp->z_zfsvfs->z_os,
|
||||
dzp->z_id, dl->dl_name, tx);
|
||||
}
|
||||
ASSERT(error == 0);
|
||||
|
||||
if (unlinkedp != NULL)
|
||||
*unlinkedp = unlinked;
|
||||
else if (unlinked)
|
||||
zfs_unlinked_add(zp, tx);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Indicate whether the directory is empty. Works with or without z_lock
|
||||
* held, but can only be consider a hint in the latter case. Returns true
|
||||
* if only "." and ".." remain and there's no work in progress.
|
||||
*/
|
||||
boolean_t
|
||||
zfs_dirempty(znode_t *dzp)
|
||||
{
|
||||
return (dzp->z_phys->zp_size == 2 && dzp->z_dirlocks == 0);
|
||||
}
|
||||
|
||||
int
|
||||
zfs_make_xattrdir(znode_t *zp, vattr_t *vap, vnode_t **xvpp, cred_t *cr)
|
||||
{
|
||||
zfsvfs_t *zfsvfs = zp->z_zfsvfs;
|
||||
znode_t *xzp;
|
||||
dmu_tx_t *tx;
|
||||
int error;
|
||||
zfs_fuid_info_t *fuidp = NULL;
|
||||
|
||||
*xvpp = NULL;
|
||||
|
||||
if (error = zfs_zaccess(zp, ACE_WRITE_NAMED_ATTRS, 0, B_FALSE, cr))
|
||||
return (error);
|
||||
|
||||
tx = dmu_tx_create(zfsvfs->z_os);
|
||||
dmu_tx_hold_bonus(tx, zp->z_id);
|
||||
dmu_tx_hold_zap(tx, DMU_NEW_OBJECT, FALSE, NULL);
|
||||
if (IS_EPHEMERAL(crgetuid(cr)) || IS_EPHEMERAL(crgetgid(cr))) {
|
||||
if (zfsvfs->z_fuid_obj == 0) {
|
||||
dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT);
|
||||
dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0,
|
||||
FUID_SIZE_ESTIMATE(zfsvfs));
|
||||
dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, FALSE, NULL);
|
||||
} else {
|
||||
dmu_tx_hold_bonus(tx, zfsvfs->z_fuid_obj);
|
||||
dmu_tx_hold_write(tx, zfsvfs->z_fuid_obj, 0,
|
||||
FUID_SIZE_ESTIMATE(zfsvfs));
|
||||
}
|
||||
}
|
||||
error = dmu_tx_assign(tx, zfsvfs->z_assign);
|
||||
if (error) {
|
||||
if (error == ERESTART && zfsvfs->z_assign == TXG_NOWAIT)
|
||||
dmu_tx_wait(tx);
|
||||
dmu_tx_abort(tx);
|
||||
return (error);
|
||||
}
|
||||
zfs_mknode(zp, vap, tx, cr, IS_XATTR, &xzp, 0, NULL, &fuidp);
|
||||
ASSERT(xzp->z_phys->zp_parent == zp->z_id);
|
||||
dmu_buf_will_dirty(zp->z_dbuf, tx);
|
||||
zp->z_phys->zp_xattr = xzp->z_id;
|
||||
|
||||
(void) zfs_log_create(zfsvfs->z_log, tx, TX_MKXATTR, zp,
|
||||
xzp, "", NULL, fuidp, vap);
|
||||
if (fuidp)
|
||||
zfs_fuid_info_free(fuidp);
|
||||
dmu_tx_commit(tx);
|
||||
|
||||
*xvpp = ZTOV(xzp);
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Return a znode for the extended attribute directory for zp.
|
||||
* ** If the directory does not already exist, it is created **
|
||||
*
|
||||
* IN: zp - znode to obtain attribute directory from
|
||||
* cr - credentials of caller
|
||||
* flags - flags from the VOP_LOOKUP call
|
||||
*
|
||||
* OUT: xzpp - pointer to extended attribute znode
|
||||
*
|
||||
* RETURN: 0 on success
|
||||
* error number on failure
|
||||
*/
|
||||
int
|
||||
zfs_get_xattrdir(znode_t *zp, vnode_t **xvpp, cred_t *cr, int flags)
|
||||
{
|
||||
zfsvfs_t *zfsvfs = zp->z_zfsvfs;
|
||||
znode_t *xzp;
|
||||
zfs_dirlock_t *dl;
|
||||
vattr_t va;
|
||||
int error;
|
||||
top:
|
||||
error = zfs_dirent_lock(&dl, zp, "", &xzp, ZXATTR, NULL, NULL);
|
||||
if (error)
|
||||
return (error);
|
||||
|
||||
if (xzp != NULL) {
|
||||
*xvpp = ZTOV(xzp);
|
||||
zfs_dirent_unlock(dl);
|
||||
return (0);
|
||||
}
|
||||
|
||||
ASSERT(zp->z_phys->zp_xattr == 0);
|
||||
|
||||
if (!(flags & CREATE_XATTR_DIR)) {
|
||||
zfs_dirent_unlock(dl);
|
||||
return (ENOENT);
|
||||
}
|
||||
|
||||
if (zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) {
|
||||
zfs_dirent_unlock(dl);
|
||||
return (EROFS);
|
||||
}
|
||||
|
||||
/*
|
||||
* The ability to 'create' files in an attribute
|
||||
* directory comes from the write_xattr permission on the base file.
|
||||
*
|
||||
* The ability to 'search' an attribute directory requires
|
||||
* read_xattr permission on the base file.
|
||||
*
|
||||
* Once in a directory the ability to read/write attributes
|
||||
* is controlled by the permissions on the attribute file.
|
||||
*/
|
||||
va.va_mask = AT_TYPE | AT_MODE | AT_UID | AT_GID;
|
||||
va.va_type = VDIR;
|
||||
va.va_mode = S_IFDIR | S_ISVTX | 0777;
|
||||
zfs_fuid_map_ids(zp, cr, &va.va_uid, &va.va_gid);
|
||||
|
||||
error = zfs_make_xattrdir(zp, &va, xvpp, cr);
|
||||
zfs_dirent_unlock(dl);
|
||||
|
||||
if (error == ERESTART && zfsvfs->z_assign == TXG_NOWAIT) {
|
||||
/* NB: we already did dmu_tx_wait() if necessary */
|
||||
goto top;
|
||||
}
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
/*
|
||||
* Decide whether it is okay to remove within a sticky directory.
|
||||
*
|
||||
* In sticky directories, write access is not sufficient;
|
||||
* you can remove entries from a directory only if:
|
||||
*
|
||||
* you own the directory,
|
||||
* you own the entry,
|
||||
* the entry is a plain file and you have write access,
|
||||
* or you are privileged (checked in secpolicy...).
|
||||
*
|
||||
* The function returns 0 if remove access is granted.
|
||||
*/
|
||||
int
|
||||
zfs_sticky_remove_access(znode_t *zdp, znode_t *zp, cred_t *cr)
|
||||
{
|
||||
uid_t uid;
|
||||
uid_t downer;
|
||||
uid_t fowner;
|
||||
zfsvfs_t *zfsvfs = zdp->z_zfsvfs;
|
||||
|
||||
if (zdp->z_zfsvfs->z_assign >= TXG_INITIAL) /* ZIL replay */
|
||||
return (0);
|
||||
|
||||
if ((zdp->z_phys->zp_mode & S_ISVTX) == 0)
|
||||
return (0);
|
||||
|
||||
downer = zfs_fuid_map_id(zfsvfs, zdp->z_phys->zp_uid, cr, ZFS_OWNER);
|
||||
fowner = zfs_fuid_map_id(zfsvfs, zp->z_phys->zp_uid, cr, ZFS_OWNER);
|
||||
|
||||
if ((uid = crgetuid(cr)) == downer || uid == fowner ||
|
||||
(ZTOV(zp)->v_type == VREG &&
|
||||
zfs_zaccess(zp, ACE_WRITE_DATA, 0, B_FALSE, cr) == 0))
|
||||
return (0);
|
||||
else
|
||||
return (secpolicy_vnode_remove(cr));
|
||||
}
|
|
@ -0,0 +1,688 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "@(#)zfs_fuid.c 1.5 08/01/31 SMI"
|
||||
|
||||
#include <sys/zfs_context.h>
|
||||
#include <sys/sunddi.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/avl.h>
|
||||
#include <sys/zap.h>
|
||||
#include <sys/refcount.h>
|
||||
#include <sys/nvpair.h>
|
||||
#ifdef _KERNEL
|
||||
#include <sys/kidmap.h>
|
||||
#include <sys/sid.h>
|
||||
#include <sys/zfs_vfsops.h>
|
||||
#include <sys/zfs_znode.h>
|
||||
#endif
|
||||
#include <sys/zfs_fuid.h>
|
||||
|
||||
/*
|
||||
* FUID Domain table(s).
|
||||
*
|
||||
* The FUID table is stored as a packed nvlist of an array
|
||||
* of nvlists which contain an index, domain string and offset
|
||||
*
|
||||
* During file system initialization the nvlist(s) are read and
|
||||
* two AVL trees are created. One tree is keyed by the index number
|
||||
* and the other by the domain string. Nodes are never removed from
|
||||
* trees, but new entries may be added. If a new entry is added then the
|
||||
* on-disk packed nvlist will also be updated.
|
||||
*/
|
||||
|
||||
#define FUID_IDX "fuid_idx"
|
||||
#define FUID_DOMAIN "fuid_domain"
|
||||
#define FUID_OFFSET "fuid_offset"
|
||||
#define FUID_NVP_ARRAY "fuid_nvlist"
|
||||
|
||||
typedef struct fuid_domain {
|
||||
avl_node_t f_domnode;
|
||||
avl_node_t f_idxnode;
|
||||
ksiddomain_t *f_ksid;
|
||||
uint64_t f_idx;
|
||||
} fuid_domain_t;
|
||||
|
||||
/*
|
||||
* Compare two indexes.
|
||||
*/
|
||||
static int
|
||||
idx_compare(const void *arg1, const void *arg2)
|
||||
{
|
||||
const fuid_domain_t *node1 = arg1;
|
||||
const fuid_domain_t *node2 = arg2;
|
||||
|
||||
if (node1->f_idx < node2->f_idx)
|
||||
return (-1);
|
||||
else if (node1->f_idx > node2->f_idx)
|
||||
return (1);
|
||||
return (0);
|
||||
}
|
||||
|
||||
/*
|
||||
* Compare two domain strings.
|
||||
*/
|
||||
static int
|
||||
domain_compare(const void *arg1, const void *arg2)
|
||||
{
|
||||
const fuid_domain_t *node1 = arg1;
|
||||
const fuid_domain_t *node2 = arg2;
|
||||
int val;
|
||||
|
||||
val = strcmp(node1->f_ksid->kd_name, node2->f_ksid->kd_name);
|
||||
if (val == 0)
|
||||
return (0);
|
||||
return (val > 0 ? 1 : -1);
|
||||
}
|
||||
|
||||
/*
|
||||
* load initial fuid domain and idx trees. This function is used by
|
||||
* both the kernel and zdb.
|
||||
*/
|
||||
uint64_t
|
||||
zfs_fuid_table_load(objset_t *os, uint64_t fuid_obj, avl_tree_t *idx_tree,
|
||||
avl_tree_t *domain_tree)
|
||||
{
|
||||
dmu_buf_t *db;
|
||||
uint64_t fuid_size;
|
||||
|
||||
avl_create(idx_tree, idx_compare,
|
||||
sizeof (fuid_domain_t), offsetof(fuid_domain_t, f_idxnode));
|
||||
avl_create(domain_tree, domain_compare,
|
||||
sizeof (fuid_domain_t), offsetof(fuid_domain_t, f_domnode));
|
||||
|
||||
VERIFY(0 == dmu_bonus_hold(os, fuid_obj, FTAG, &db));
|
||||
fuid_size = *(uint64_t *)db->db_data;
|
||||
dmu_buf_rele(db, FTAG);
|
||||
|
||||
if (fuid_size) {
|
||||
nvlist_t **fuidnvp;
|
||||
nvlist_t *nvp = NULL;
|
||||
uint_t count;
|
||||
char *packed;
|
||||
int i;
|
||||
|
||||
packed = kmem_alloc(fuid_size, KM_SLEEP);
|
||||
VERIFY(dmu_read(os, fuid_obj, 0, fuid_size, packed) == 0);
|
||||
VERIFY(nvlist_unpack(packed, fuid_size,
|
||||
&nvp, 0) == 0);
|
||||
VERIFY(nvlist_lookup_nvlist_array(nvp, FUID_NVP_ARRAY,
|
||||
&fuidnvp, &count) == 0);
|
||||
|
||||
for (i = 0; i != count; i++) {
|
||||
fuid_domain_t *domnode;
|
||||
char *domain;
|
||||
uint64_t idx;
|
||||
|
||||
VERIFY(nvlist_lookup_string(fuidnvp[i], FUID_DOMAIN,
|
||||
&domain) == 0);
|
||||
VERIFY(nvlist_lookup_uint64(fuidnvp[i], FUID_IDX,
|
||||
&idx) == 0);
|
||||
|
||||
domnode = kmem_alloc(sizeof (fuid_domain_t), KM_SLEEP);
|
||||
|
||||
domnode->f_idx = idx;
|
||||
domnode->f_ksid = ksid_lookupdomain(domain);
|
||||
avl_add(idx_tree, domnode);
|
||||
avl_add(domain_tree, domnode);
|
||||
}
|
||||
nvlist_free(nvp);
|
||||
kmem_free(packed, fuid_size);
|
||||
}
|
||||
return (fuid_size);
|
||||
}
|
||||
|
||||
void
|
||||
zfs_fuid_table_destroy(avl_tree_t *idx_tree, avl_tree_t *domain_tree)
|
||||
{
|
||||
fuid_domain_t *domnode;
|
||||
void *cookie;
|
||||
|
||||
cookie = NULL;
|
||||
while (domnode = avl_destroy_nodes(domain_tree, &cookie))
|
||||
ksiddomain_rele(domnode->f_ksid);
|
||||
|
||||
avl_destroy(domain_tree);
|
||||
cookie = NULL;
|
||||
while (domnode = avl_destroy_nodes(idx_tree, &cookie))
|
||||
kmem_free(domnode, sizeof (fuid_domain_t));
|
||||
avl_destroy(idx_tree);
|
||||
}
|
||||
|
||||
char *
|
||||
zfs_fuid_idx_domain(avl_tree_t *idx_tree, uint32_t idx)
|
||||
{
|
||||
fuid_domain_t searchnode, *findnode;
|
||||
avl_index_t loc;
|
||||
|
||||
searchnode.f_idx = idx;
|
||||
|
||||
findnode = avl_find(idx_tree, &searchnode, &loc);
|
||||
|
||||
return (findnode->f_ksid->kd_name);
|
||||
}
|
||||
|
||||
#ifdef _KERNEL
|
||||
/*
|
||||
* Load the fuid table(s) into memory.
|
||||
*/
|
||||
static void
|
||||
zfs_fuid_init(zfsvfs_t *zfsvfs, dmu_tx_t *tx)
|
||||
{
|
||||
int error = 0;
|
||||
|
||||
rw_enter(&zfsvfs->z_fuid_lock, RW_WRITER);
|
||||
|
||||
if (zfsvfs->z_fuid_loaded) {
|
||||
rw_exit(&zfsvfs->z_fuid_lock);
|
||||
return;
|
||||
}
|
||||
|
||||
if (zfsvfs->z_fuid_obj == 0) {
|
||||
|
||||
/* first make sure we need to allocate object */
|
||||
|
||||
error = zap_lookup(zfsvfs->z_os, MASTER_NODE_OBJ,
|
||||
ZFS_FUID_TABLES, 8, 1, &zfsvfs->z_fuid_obj);
|
||||
if (error == ENOENT && tx != NULL) {
|
||||
zfsvfs->z_fuid_obj = dmu_object_alloc(zfsvfs->z_os,
|
||||
DMU_OT_FUID, 1 << 14, DMU_OT_FUID_SIZE,
|
||||
sizeof (uint64_t), tx);
|
||||
VERIFY(zap_add(zfsvfs->z_os, MASTER_NODE_OBJ,
|
||||
ZFS_FUID_TABLES, sizeof (uint64_t), 1,
|
||||
&zfsvfs->z_fuid_obj, tx) == 0);
|
||||
}
|
||||
}
|
||||
|
||||
zfsvfs->z_fuid_size = zfs_fuid_table_load(zfsvfs->z_os,
|
||||
zfsvfs->z_fuid_obj, &zfsvfs->z_fuid_idx, &zfsvfs->z_fuid_domain);
|
||||
|
||||
zfsvfs->z_fuid_loaded = B_TRUE;
|
||||
rw_exit(&zfsvfs->z_fuid_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* Query domain table for a given domain.
|
||||
*
|
||||
* If domain isn't found it is added to AVL trees and
|
||||
* the results are pushed out to disk.
|
||||
*/
|
||||
int
|
||||
zfs_fuid_find_by_domain(zfsvfs_t *zfsvfs, const char *domain, char **retdomain,
|
||||
dmu_tx_t *tx)
|
||||
{
|
||||
fuid_domain_t searchnode, *findnode;
|
||||
avl_index_t loc;
|
||||
|
||||
/*
|
||||
* If the dummy "nobody" domain then return an index of 0
|
||||
* to cause the created FUID to be a standard POSIX id
|
||||
* for the user nobody.
|
||||
*/
|
||||
if (domain[0] == '\0') {
|
||||
*retdomain = "";
|
||||
return (0);
|
||||
}
|
||||
|
||||
searchnode.f_ksid = ksid_lookupdomain(domain);
|
||||
if (retdomain) {
|
||||
*retdomain = searchnode.f_ksid->kd_name;
|
||||
}
|
||||
if (!zfsvfs->z_fuid_loaded)
|
||||
zfs_fuid_init(zfsvfs, tx);
|
||||
|
||||
rw_enter(&zfsvfs->z_fuid_lock, RW_READER);
|
||||
findnode = avl_find(&zfsvfs->z_fuid_domain, &searchnode, &loc);
|
||||
rw_exit(&zfsvfs->z_fuid_lock);
|
||||
|
||||
if (findnode) {
|
||||
ksiddomain_rele(searchnode.f_ksid);
|
||||
return (findnode->f_idx);
|
||||
} else {
|
||||
fuid_domain_t *domnode;
|
||||
nvlist_t *nvp;
|
||||
nvlist_t **fuids;
|
||||
uint64_t retidx;
|
||||
size_t nvsize = 0;
|
||||
char *packed;
|
||||
dmu_buf_t *db;
|
||||
int i = 0;
|
||||
|
||||
domnode = kmem_alloc(sizeof (fuid_domain_t), KM_SLEEP);
|
||||
domnode->f_ksid = searchnode.f_ksid;
|
||||
|
||||
rw_enter(&zfsvfs->z_fuid_lock, RW_WRITER);
|
||||
retidx = domnode->f_idx = avl_numnodes(&zfsvfs->z_fuid_idx) + 1;
|
||||
|
||||
avl_add(&zfsvfs->z_fuid_domain, domnode);
|
||||
avl_add(&zfsvfs->z_fuid_idx, domnode);
|
||||
/*
|
||||
* Now resync the on-disk nvlist.
|
||||
*/
|
||||
VERIFY(nvlist_alloc(&nvp, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
||||
|
||||
domnode = avl_first(&zfsvfs->z_fuid_domain);
|
||||
fuids = kmem_alloc(retidx * sizeof (void *), KM_SLEEP);
|
||||
while (domnode) {
|
||||
VERIFY(nvlist_alloc(&fuids[i],
|
||||
NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
||||
VERIFY(nvlist_add_uint64(fuids[i], FUID_IDX,
|
||||
domnode->f_idx) == 0);
|
||||
VERIFY(nvlist_add_uint64(fuids[i],
|
||||
FUID_OFFSET, 0) == 0);
|
||||
VERIFY(nvlist_add_string(fuids[i++], FUID_DOMAIN,
|
||||
domnode->f_ksid->kd_name) == 0);
|
||||
domnode = AVL_NEXT(&zfsvfs->z_fuid_domain, domnode);
|
||||
}
|
||||
VERIFY(nvlist_add_nvlist_array(nvp, FUID_NVP_ARRAY,
|
||||
fuids, retidx) == 0);
|
||||
for (i = 0; i != retidx; i++)
|
||||
nvlist_free(fuids[i]);
|
||||
kmem_free(fuids, retidx * sizeof (void *));
|
||||
VERIFY(nvlist_size(nvp, &nvsize, NV_ENCODE_XDR) == 0);
|
||||
packed = kmem_alloc(nvsize, KM_SLEEP);
|
||||
VERIFY(nvlist_pack(nvp, &packed, &nvsize,
|
||||
NV_ENCODE_XDR, KM_SLEEP) == 0);
|
||||
nvlist_free(nvp);
|
||||
zfsvfs->z_fuid_size = nvsize;
|
||||
dmu_write(zfsvfs->z_os, zfsvfs->z_fuid_obj, 0,
|
||||
zfsvfs->z_fuid_size, packed, tx);
|
||||
kmem_free(packed, zfsvfs->z_fuid_size);
|
||||
VERIFY(0 == dmu_bonus_hold(zfsvfs->z_os, zfsvfs->z_fuid_obj,
|
||||
FTAG, &db));
|
||||
dmu_buf_will_dirty(db, tx);
|
||||
*(uint64_t *)db->db_data = zfsvfs->z_fuid_size;
|
||||
dmu_buf_rele(db, FTAG);
|
||||
|
||||
rw_exit(&zfsvfs->z_fuid_lock);
|
||||
return (retidx);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Query domain table by index, returning domain string
|
||||
*
|
||||
* Returns a pointer from an avl node of the domain string.
|
||||
*
|
||||
*/
|
||||
static char *
|
||||
zfs_fuid_find_by_idx(zfsvfs_t *zfsvfs, uint32_t idx)
|
||||
{
|
||||
char *domain;
|
||||
|
||||
if (idx == 0 || !zfsvfs->z_use_fuids)
|
||||
return (NULL);
|
||||
|
||||
if (!zfsvfs->z_fuid_loaded)
|
||||
zfs_fuid_init(zfsvfs, NULL);
|
||||
|
||||
rw_enter(&zfsvfs->z_fuid_lock, RW_READER);
|
||||
domain = zfs_fuid_idx_domain(&zfsvfs->z_fuid_idx, idx);
|
||||
rw_exit(&zfsvfs->z_fuid_lock);
|
||||
|
||||
ASSERT(domain);
|
||||
return (domain);
|
||||
}
|
||||
|
||||
void
|
||||
zfs_fuid_map_ids(znode_t *zp, cred_t *cr, uid_t *uidp, uid_t *gidp)
|
||||
{
|
||||
*uidp = zfs_fuid_map_id(zp->z_zfsvfs, zp->z_phys->zp_uid,
|
||||
cr, ZFS_OWNER);
|
||||
*gidp = zfs_fuid_map_id(zp->z_zfsvfs, zp->z_phys->zp_gid,
|
||||
cr, ZFS_GROUP);
|
||||
}
|
||||
|
||||
uid_t
|
||||
zfs_fuid_map_id(zfsvfs_t *zfsvfs, uint64_t fuid,
|
||||
cred_t *cr, zfs_fuid_type_t type)
|
||||
{
|
||||
uint32_t index = FUID_INDEX(fuid);
|
||||
char *domain;
|
||||
uid_t id;
|
||||
|
||||
if (index == 0)
|
||||
return (fuid);
|
||||
|
||||
domain = zfs_fuid_find_by_idx(zfsvfs, index);
|
||||
ASSERT(domain != NULL);
|
||||
|
||||
if (type == ZFS_OWNER || type == ZFS_ACE_USER) {
|
||||
(void) kidmap_getuidbysid(crgetzone(cr), domain,
|
||||
FUID_RID(fuid), &id);
|
||||
} else {
|
||||
(void) kidmap_getgidbysid(crgetzone(cr), domain,
|
||||
FUID_RID(fuid), &id);
|
||||
}
|
||||
return (id);
|
||||
}
|
||||
|
||||
/*
|
||||
* Add a FUID node to the list of fuid's being created for this
|
||||
* ACL
|
||||
*
|
||||
* If ACL has multiple domains, then keep only one copy of each unique
|
||||
* domain.
|
||||
*/
|
||||
static void
|
||||
zfs_fuid_node_add(zfs_fuid_info_t **fuidpp, const char *domain, uint32_t rid,
|
||||
uint64_t idx, uint64_t id, zfs_fuid_type_t type)
|
||||
{
|
||||
zfs_fuid_t *fuid;
|
||||
zfs_fuid_domain_t *fuid_domain;
|
||||
zfs_fuid_info_t *fuidp;
|
||||
uint64_t fuididx;
|
||||
boolean_t found = B_FALSE;
|
||||
|
||||
if (*fuidpp == NULL)
|
||||
*fuidpp = zfs_fuid_info_alloc();
|
||||
|
||||
fuidp = *fuidpp;
|
||||
/*
|
||||
* First find fuid domain index in linked list
|
||||
*
|
||||
* If one isn't found then create an entry.
|
||||
*/
|
||||
|
||||
for (fuididx = 1, fuid_domain = list_head(&fuidp->z_domains);
|
||||
fuid_domain; fuid_domain = list_next(&fuidp->z_domains,
|
||||
fuid_domain), fuididx++) {
|
||||
if (idx == fuid_domain->z_domidx) {
|
||||
found = B_TRUE;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!found) {
|
||||
fuid_domain = kmem_alloc(sizeof (zfs_fuid_domain_t), KM_SLEEP);
|
||||
fuid_domain->z_domain = domain;
|
||||
fuid_domain->z_domidx = idx;
|
||||
list_insert_tail(&fuidp->z_domains, fuid_domain);
|
||||
fuidp->z_domain_str_sz += strlen(domain) + 1;
|
||||
fuidp->z_domain_cnt++;
|
||||
}
|
||||
|
||||
if (type == ZFS_ACE_USER || type == ZFS_ACE_GROUP) {
|
||||
/*
|
||||
* Now allocate fuid entry and add it on the end of the list
|
||||
*/
|
||||
|
||||
fuid = kmem_alloc(sizeof (zfs_fuid_t), KM_SLEEP);
|
||||
fuid->z_id = id;
|
||||
fuid->z_domidx = idx;
|
||||
fuid->z_logfuid = FUID_ENCODE(fuididx, rid);
|
||||
|
||||
list_insert_tail(&fuidp->z_fuids, fuid);
|
||||
fuidp->z_fuid_cnt++;
|
||||
} else {
|
||||
if (type == ZFS_OWNER)
|
||||
fuidp->z_fuid_owner = FUID_ENCODE(fuididx, rid);
|
||||
else
|
||||
fuidp->z_fuid_group = FUID_ENCODE(fuididx, rid);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Create a file system FUID, based on information in the users cred
|
||||
*/
|
||||
uint64_t
|
||||
zfs_fuid_create_cred(zfsvfs_t *zfsvfs, zfs_fuid_type_t type,
|
||||
dmu_tx_t *tx, cred_t *cr, zfs_fuid_info_t **fuidp)
|
||||
{
|
||||
uint64_t idx;
|
||||
ksid_t *ksid;
|
||||
uint32_t rid;
|
||||
char *kdomain;
|
||||
const char *domain;
|
||||
uid_t id;
|
||||
|
||||
VERIFY(type == ZFS_OWNER || type == ZFS_GROUP);
|
||||
|
||||
if (type == ZFS_OWNER)
|
||||
id = crgetuid(cr);
|
||||
else
|
||||
id = crgetgid(cr);
|
||||
|
||||
if (!zfsvfs->z_use_fuids || !IS_EPHEMERAL(id))
|
||||
return ((uint64_t)id);
|
||||
|
||||
ksid = crgetsid(cr, (type == ZFS_OWNER) ? KSID_OWNER : KSID_GROUP);
|
||||
|
||||
VERIFY(ksid != NULL);
|
||||
rid = ksid_getrid(ksid);
|
||||
domain = ksid_getdomain(ksid);
|
||||
|
||||
idx = zfs_fuid_find_by_domain(zfsvfs, domain, &kdomain, tx);
|
||||
|
||||
zfs_fuid_node_add(fuidp, kdomain, rid, idx, id, type);
|
||||
|
||||
return (FUID_ENCODE(idx, rid));
|
||||
}
|
||||
|
||||
/*
|
||||
* Create a file system FUID for an ACL ace
|
||||
* or a chown/chgrp of the file.
|
||||
* This is similar to zfs_fuid_create_cred, except that
|
||||
* we can't find the domain + rid information in the
|
||||
* cred. Instead we have to query Winchester for the
|
||||
* domain and rid.
|
||||
*
|
||||
* During replay operations the domain+rid information is
|
||||
* found in the zfs_fuid_info_t that the replay code has
|
||||
* attached to the zfsvfs of the file system.
|
||||
*/
|
||||
uint64_t
|
||||
zfs_fuid_create(zfsvfs_t *zfsvfs, uint64_t id, cred_t *cr,
|
||||
zfs_fuid_type_t type, dmu_tx_t *tx, zfs_fuid_info_t **fuidpp)
|
||||
{
|
||||
const char *domain;
|
||||
char *kdomain;
|
||||
uint32_t fuid_idx = FUID_INDEX(id);
|
||||
uint32_t rid;
|
||||
idmap_stat status;
|
||||
uint64_t idx;
|
||||
boolean_t is_replay = (zfsvfs->z_assign >= TXG_INITIAL);
|
||||
zfs_fuid_t *zfuid = NULL;
|
||||
zfs_fuid_info_t *fuidp;
|
||||
|
||||
/*
|
||||
* If POSIX ID, or entry is already a FUID then
|
||||
* just return the id
|
||||
*
|
||||
* We may also be handed an already FUID'ized id via
|
||||
* chmod.
|
||||
*/
|
||||
|
||||
if (!zfsvfs->z_use_fuids || !IS_EPHEMERAL(id) || fuid_idx != 0)
|
||||
return (id);
|
||||
|
||||
if (is_replay) {
|
||||
fuidp = zfsvfs->z_fuid_replay;
|
||||
|
||||
/*
|
||||
* If we are passed an ephemeral id, but no
|
||||
* fuid_info was logged then return NOBODY.
|
||||
* This is most likely a result of idmap service
|
||||
* not being available.
|
||||
*/
|
||||
if (fuidp == NULL)
|
||||
return (UID_NOBODY);
|
||||
|
||||
switch (type) {
|
||||
case ZFS_ACE_USER:
|
||||
case ZFS_ACE_GROUP:
|
||||
zfuid = list_head(&fuidp->z_fuids);
|
||||
rid = FUID_RID(zfuid->z_logfuid);
|
||||
idx = FUID_INDEX(zfuid->z_logfuid);
|
||||
break;
|
||||
case ZFS_OWNER:
|
||||
rid = FUID_RID(fuidp->z_fuid_owner);
|
||||
idx = FUID_INDEX(fuidp->z_fuid_owner);
|
||||
break;
|
||||
case ZFS_GROUP:
|
||||
rid = FUID_RID(fuidp->z_fuid_group);
|
||||
idx = FUID_INDEX(fuidp->z_fuid_group);
|
||||
break;
|
||||
};
|
||||
domain = fuidp->z_domain_table[idx -1];
|
||||
} else {
|
||||
if (type == ZFS_OWNER || type == ZFS_ACE_USER)
|
||||
status = kidmap_getsidbyuid(crgetzone(cr), id,
|
||||
&domain, &rid);
|
||||
else
|
||||
status = kidmap_getsidbygid(crgetzone(cr), id,
|
||||
&domain, &rid);
|
||||
|
||||
if (status != 0) {
|
||||
/*
|
||||
* When returning nobody we will need to
|
||||
* make a dummy fuid table entry for logging
|
||||
* purposes.
|
||||
*/
|
||||
rid = UID_NOBODY;
|
||||
domain = "";
|
||||
}
|
||||
}
|
||||
|
||||
idx = zfs_fuid_find_by_domain(zfsvfs, domain, &kdomain, tx);
|
||||
|
||||
if (!is_replay)
|
||||
zfs_fuid_node_add(fuidpp, kdomain, rid, idx, id, type);
|
||||
else if (zfuid != NULL) {
|
||||
list_remove(&fuidp->z_fuids, zfuid);
|
||||
kmem_free(zfuid, sizeof (zfs_fuid_t));
|
||||
}
|
||||
return (FUID_ENCODE(idx, rid));
|
||||
}
|
||||
|
||||
void
|
||||
zfs_fuid_destroy(zfsvfs_t *zfsvfs)
|
||||
{
|
||||
rw_enter(&zfsvfs->z_fuid_lock, RW_WRITER);
|
||||
if (!zfsvfs->z_fuid_loaded) {
|
||||
rw_exit(&zfsvfs->z_fuid_lock);
|
||||
return;
|
||||
}
|
||||
zfs_fuid_table_destroy(&zfsvfs->z_fuid_idx, &zfsvfs->z_fuid_domain);
|
||||
rw_exit(&zfsvfs->z_fuid_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* Allocate zfs_fuid_info for tracking FUIDs created during
|
||||
* zfs_mknode, VOP_SETATTR() or VOP_SETSECATTR()
|
||||
*/
|
||||
zfs_fuid_info_t *
|
||||
zfs_fuid_info_alloc(void)
|
||||
{
|
||||
zfs_fuid_info_t *fuidp;
|
||||
|
||||
fuidp = kmem_zalloc(sizeof (zfs_fuid_info_t), KM_SLEEP);
|
||||
list_create(&fuidp->z_domains, sizeof (zfs_fuid_domain_t),
|
||||
offsetof(zfs_fuid_domain_t, z_next));
|
||||
list_create(&fuidp->z_fuids, sizeof (zfs_fuid_t),
|
||||
offsetof(zfs_fuid_t, z_next));
|
||||
return (fuidp);
|
||||
}
|
||||
|
||||
/*
|
||||
* Release all memory associated with zfs_fuid_info_t
|
||||
*/
|
||||
void
|
||||
zfs_fuid_info_free(zfs_fuid_info_t *fuidp)
|
||||
{
|
||||
zfs_fuid_t *zfuid;
|
||||
zfs_fuid_domain_t *zdomain;
|
||||
|
||||
while ((zfuid = list_head(&fuidp->z_fuids)) != NULL) {
|
||||
list_remove(&fuidp->z_fuids, zfuid);
|
||||
kmem_free(zfuid, sizeof (zfs_fuid_t));
|
||||
}
|
||||
|
||||
if (fuidp->z_domain_table != NULL)
|
||||
kmem_free(fuidp->z_domain_table,
|
||||
(sizeof (char **)) * fuidp->z_domain_cnt);
|
||||
|
||||
while ((zdomain = list_head(&fuidp->z_domains)) != NULL) {
|
||||
list_remove(&fuidp->z_domains, zdomain);
|
||||
kmem_free(zdomain, sizeof (zfs_fuid_domain_t));
|
||||
}
|
||||
|
||||
kmem_free(fuidp, sizeof (zfs_fuid_info_t));
|
||||
}
|
||||
|
||||
/*
|
||||
* Check to see if id is a groupmember. If cred
|
||||
* has ksid info then sidlist is checked first
|
||||
* and if still not found then POSIX groups are checked
|
||||
*
|
||||
* Will use a straight FUID compare when possible.
|
||||
*/
|
||||
boolean_t
|
||||
zfs_groupmember(zfsvfs_t *zfsvfs, uint64_t id, cred_t *cr)
|
||||
{
|
||||
ksid_t *ksid = crgetsid(cr, KSID_GROUP);
|
||||
uid_t gid;
|
||||
|
||||
if (ksid) {
|
||||
int i;
|
||||
ksid_t *ksid_groups;
|
||||
ksidlist_t *ksidlist = crgetsidlist(cr);
|
||||
uint32_t idx = FUID_INDEX(id);
|
||||
uint32_t rid = FUID_RID(id);
|
||||
|
||||
ASSERT(ksidlist);
|
||||
ksid_groups = ksidlist->ksl_sids;
|
||||
|
||||
for (i = 0; i != ksidlist->ksl_nsid; i++) {
|
||||
if (idx == 0) {
|
||||
if (id != IDMAP_WK_CREATOR_GROUP_GID &&
|
||||
id == ksid_groups[i].ks_id) {
|
||||
return (B_TRUE);
|
||||
}
|
||||
} else {
|
||||
char *domain;
|
||||
|
||||
domain = zfs_fuid_find_by_idx(zfsvfs, idx);
|
||||
ASSERT(domain != NULL);
|
||||
|
||||
if (strcmp(domain,
|
||||
IDMAP_WK_CREATOR_SID_AUTHORITY) == 0)
|
||||
return (B_FALSE);
|
||||
|
||||
if ((strcmp(domain,
|
||||
ksid_groups[i].ks_domain->kd_name) == 0) &&
|
||||
rid == ksid_groups[i].ks_rid)
|
||||
return (B_TRUE);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Not found in ksidlist, check posix groups
|
||||
*/
|
||||
gid = zfs_fuid_map_id(zfsvfs, id, cr, ZFS_GROUP);
|
||||
return (groupmember(gid, cr));
|
||||
}
|
||||
#endif
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,693 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "@(#)zfs_log.c 1.13 08/04/09 SMI"
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/param.h>
|
||||
#include <sys/systm.h>
|
||||
#include <sys/sysmacros.h>
|
||||
#include <sys/cmn_err.h>
|
||||
#include <sys/kmem.h>
|
||||
#include <sys/thread.h>
|
||||
#include <sys/file.h>
|
||||
#include <sys/vfs.h>
|
||||
#include <sys/zfs_znode.h>
|
||||
#include <sys/zfs_dir.h>
|
||||
#include <sys/zil.h>
|
||||
#include <sys/zil_impl.h>
|
||||
#include <sys/byteorder.h>
|
||||
#include <sys/policy.h>
|
||||
#include <sys/stat.h>
|
||||
#include <sys/mode.h>
|
||||
#include <sys/acl.h>
|
||||
#include <sys/dmu.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zfs_fuid.h>
|
||||
#include <sys/ddi.h>
|
||||
|
||||
/*
|
||||
* All the functions in this file are used to construct the log entries
|
||||
* to record transactions. They allocate * an intent log transaction
|
||||
* structure (itx_t) and save within it all the information necessary to
|
||||
* possibly replay the transaction. The itx is then assigned a sequence
|
||||
* number and inserted in the in-memory list anchored in the zilog.
|
||||
*/
|
||||
|
||||
int
|
||||
zfs_log_create_txtype(zil_create_t type, vsecattr_t *vsecp, vattr_t *vap)
|
||||
{
|
||||
int isxvattr = (vap->va_mask & AT_XVATTR);
|
||||
switch (type) {
|
||||
case Z_FILE:
|
||||
if (vsecp == NULL && !isxvattr)
|
||||
return (TX_CREATE);
|
||||
if (vsecp && isxvattr)
|
||||
return (TX_CREATE_ACL_ATTR);
|
||||
if (vsecp)
|
||||
return (TX_CREATE_ACL);
|
||||
else
|
||||
return (TX_CREATE_ATTR);
|
||||
/*NOTREACHED*/
|
||||
case Z_DIR:
|
||||
if (vsecp == NULL && !isxvattr)
|
||||
return (TX_MKDIR);
|
||||
if (vsecp && isxvattr)
|
||||
return (TX_MKDIR_ACL_ATTR);
|
||||
if (vsecp)
|
||||
return (TX_MKDIR_ACL);
|
||||
else
|
||||
return (TX_MKDIR_ATTR);
|
||||
case Z_XATTRDIR:
|
||||
return (TX_MKXATTR);
|
||||
}
|
||||
ASSERT(0);
|
||||
return (TX_MAX_TYPE);
|
||||
}
|
||||
|
||||
/*
|
||||
* build up the log data necessary for logging xvattr_t
|
||||
* First lr_attr_t is initialized. following the lr_attr_t
|
||||
* is the mapsize and attribute bitmap copied from the xvattr_t.
|
||||
* Following the bitmap and bitmapsize two 64 bit words are reserved
|
||||
* for the create time which may be set. Following the create time
|
||||
* records a single 64 bit integer which has the bits to set on
|
||||
* replay for the xvattr.
|
||||
*/
|
||||
static void
|
||||
zfs_log_xvattr(lr_attr_t *lrattr, xvattr_t *xvap)
|
||||
{
|
||||
uint32_t *bitmap;
|
||||
uint64_t *attrs;
|
||||
uint64_t *crtime;
|
||||
xoptattr_t *xoap;
|
||||
void *scanstamp;
|
||||
int i;
|
||||
|
||||
xoap = xva_getxoptattr(xvap);
|
||||
ASSERT(xoap);
|
||||
|
||||
lrattr->lr_attr_masksize = xvap->xva_mapsize;
|
||||
bitmap = &lrattr->lr_attr_bitmap;
|
||||
for (i = 0; i != xvap->xva_mapsize; i++, bitmap++) {
|
||||
*bitmap = xvap->xva_reqattrmap[i];
|
||||
}
|
||||
|
||||
/* Now pack the attributes up in a single uint64_t */
|
||||
attrs = (uint64_t *)bitmap;
|
||||
crtime = attrs + 1;
|
||||
scanstamp = (caddr_t)(crtime + 2);
|
||||
*attrs = 0;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_READONLY))
|
||||
*attrs |= (xoap->xoa_readonly == 0) ? 0 :
|
||||
XAT0_READONLY;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_HIDDEN))
|
||||
*attrs |= (xoap->xoa_hidden == 0) ? 0 :
|
||||
XAT0_HIDDEN;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_SYSTEM))
|
||||
*attrs |= (xoap->xoa_system == 0) ? 0 :
|
||||
XAT0_SYSTEM;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_ARCHIVE))
|
||||
*attrs |= (xoap->xoa_archive == 0) ? 0 :
|
||||
XAT0_ARCHIVE;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_IMMUTABLE))
|
||||
*attrs |= (xoap->xoa_immutable == 0) ? 0 :
|
||||
XAT0_IMMUTABLE;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_NOUNLINK))
|
||||
*attrs |= (xoap->xoa_nounlink == 0) ? 0 :
|
||||
XAT0_NOUNLINK;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_APPENDONLY))
|
||||
*attrs |= (xoap->xoa_appendonly == 0) ? 0 :
|
||||
XAT0_APPENDONLY;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_OPAQUE))
|
||||
*attrs |= (xoap->xoa_opaque == 0) ? 0 :
|
||||
XAT0_APPENDONLY;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_NODUMP))
|
||||
*attrs |= (xoap->xoa_nodump == 0) ? 0 :
|
||||
XAT0_NODUMP;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_AV_QUARANTINED))
|
||||
*attrs |= (xoap->xoa_av_quarantined == 0) ? 0 :
|
||||
XAT0_AV_QUARANTINED;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_AV_MODIFIED))
|
||||
*attrs |= (xoap->xoa_av_modified == 0) ? 0 :
|
||||
XAT0_AV_MODIFIED;
|
||||
if (XVA_ISSET_REQ(xvap, XAT_CREATETIME))
|
||||
ZFS_TIME_ENCODE(&xoap->xoa_createtime, crtime);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP))
|
||||
bcopy(xoap->xoa_av_scanstamp, scanstamp, AV_SCANSTAMP_SZ);
|
||||
}
|
||||
|
||||
static void *
|
||||
zfs_log_fuid_ids(zfs_fuid_info_t *fuidp, void *start)
|
||||
{
|
||||
zfs_fuid_t *zfuid;
|
||||
uint64_t *fuidloc = start;
|
||||
|
||||
/* First copy in the ACE FUIDs */
|
||||
for (zfuid = list_head(&fuidp->z_fuids); zfuid;
|
||||
zfuid = list_next(&fuidp->z_fuids, zfuid)) {
|
||||
*fuidloc++ = zfuid->z_logfuid;
|
||||
}
|
||||
return (fuidloc);
|
||||
}
|
||||
|
||||
|
||||
static void *
|
||||
zfs_log_fuid_domains(zfs_fuid_info_t *fuidp, void *start)
|
||||
{
|
||||
zfs_fuid_domain_t *zdomain;
|
||||
|
||||
/* now copy in the domain info, if any */
|
||||
if (fuidp->z_domain_str_sz != 0) {
|
||||
for (zdomain = list_head(&fuidp->z_domains); zdomain;
|
||||
zdomain = list_next(&fuidp->z_domains, zdomain)) {
|
||||
bcopy((void *)zdomain->z_domain, start,
|
||||
strlen(zdomain->z_domain) + 1);
|
||||
start = (caddr_t)start +
|
||||
strlen(zdomain->z_domain) + 1;
|
||||
}
|
||||
}
|
||||
return (start);
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_create() is used to handle TX_CREATE, TX_CREATE_ATTR, TX_MKDIR,
|
||||
* TX_MKDIR_ATTR and TX_MKXATTR
|
||||
* transactions.
|
||||
*
|
||||
* TX_CREATE and TX_MKDIR are standard creates, but they may have FUID
|
||||
* domain information appended prior to the name. In this case the
|
||||
* uid/gid in the log record will be a log centric FUID.
|
||||
*
|
||||
* TX_CREATE_ACL_ATTR and TX_MKDIR_ACL_ATTR handle special creates that
|
||||
* may contain attributes, ACL and optional fuid information.
|
||||
*
|
||||
* TX_CREATE_ACL and TX_MKDIR_ACL handle special creates that specify
|
||||
* and ACL and normal users/groups in the ACEs.
|
||||
*
|
||||
* There may be an optional xvattr attribute information similar
|
||||
* to zfs_log_setattr.
|
||||
*
|
||||
* Also, after the file name "domain" strings may be appended.
|
||||
*/
|
||||
void
|
||||
zfs_log_create(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
|
||||
znode_t *dzp, znode_t *zp, char *name, vsecattr_t *vsecp,
|
||||
zfs_fuid_info_t *fuidp, vattr_t *vap)
|
||||
{
|
||||
itx_t *itx;
|
||||
uint64_t seq;
|
||||
lr_create_t *lr;
|
||||
lr_acl_create_t *lracl;
|
||||
size_t aclsize;
|
||||
size_t xvatsize = 0;
|
||||
size_t txsize;
|
||||
xvattr_t *xvap = (xvattr_t *)vap;
|
||||
void *end;
|
||||
size_t lrsize;
|
||||
|
||||
size_t namesize = strlen(name) + 1;
|
||||
size_t fuidsz = 0;
|
||||
|
||||
if (zilog == NULL)
|
||||
return;
|
||||
|
||||
/*
|
||||
* If we have FUIDs present then add in space for
|
||||
* domains and ACE fuid's if any.
|
||||
*/
|
||||
if (fuidp) {
|
||||
fuidsz += fuidp->z_domain_str_sz;
|
||||
fuidsz += fuidp->z_fuid_cnt * sizeof (uint64_t);
|
||||
}
|
||||
|
||||
if (vap->va_mask & AT_XVATTR)
|
||||
xvatsize = ZIL_XVAT_SIZE(xvap->xva_mapsize);
|
||||
|
||||
if ((int)txtype == TX_CREATE_ATTR || (int)txtype == TX_MKDIR_ATTR ||
|
||||
(int)txtype == TX_CREATE || (int)txtype == TX_MKDIR ||
|
||||
(int)txtype == TX_MKXATTR) {
|
||||
txsize = sizeof (*lr) + namesize + fuidsz + xvatsize;
|
||||
lrsize = sizeof (*lr);
|
||||
} else {
|
||||
aclsize = (vsecp) ? vsecp->vsa_aclentsz : 0;
|
||||
txsize =
|
||||
sizeof (lr_acl_create_t) + namesize + fuidsz +
|
||||
ZIL_ACE_LENGTH(aclsize) + xvatsize;
|
||||
lrsize = sizeof (lr_acl_create_t);
|
||||
}
|
||||
|
||||
itx = zil_itx_create(txtype, txsize);
|
||||
|
||||
lr = (lr_create_t *)&itx->itx_lr;
|
||||
lr->lr_doid = dzp->z_id;
|
||||
lr->lr_foid = zp->z_id;
|
||||
lr->lr_mode = zp->z_phys->zp_mode;
|
||||
if (!IS_EPHEMERAL(zp->z_phys->zp_uid)) {
|
||||
lr->lr_uid = (uint64_t)zp->z_phys->zp_uid;
|
||||
} else {
|
||||
lr->lr_uid = fuidp->z_fuid_owner;
|
||||
}
|
||||
if (!IS_EPHEMERAL(zp->z_phys->zp_gid)) {
|
||||
lr->lr_gid = (uint64_t)zp->z_phys->zp_gid;
|
||||
} else {
|
||||
lr->lr_gid = fuidp->z_fuid_group;
|
||||
}
|
||||
lr->lr_gen = zp->z_phys->zp_gen;
|
||||
lr->lr_crtime[0] = zp->z_phys->zp_crtime[0];
|
||||
lr->lr_crtime[1] = zp->z_phys->zp_crtime[1];
|
||||
lr->lr_rdev = zp->z_phys->zp_rdev;
|
||||
|
||||
/*
|
||||
* Fill in xvattr info if any
|
||||
*/
|
||||
if (vap->va_mask & AT_XVATTR) {
|
||||
zfs_log_xvattr((lr_attr_t *)((caddr_t)lr + lrsize), xvap);
|
||||
end = (caddr_t)lr + lrsize + xvatsize;
|
||||
} else {
|
||||
end = (caddr_t)lr + lrsize;
|
||||
}
|
||||
|
||||
/* Now fill in any ACL info */
|
||||
|
||||
if (vsecp) {
|
||||
lracl = (lr_acl_create_t *)&itx->itx_lr;
|
||||
lracl->lr_aclcnt = vsecp->vsa_aclcnt;
|
||||
lracl->lr_acl_bytes = aclsize;
|
||||
lracl->lr_domcnt = fuidp ? fuidp->z_domain_cnt : 0;
|
||||
lracl->lr_fuidcnt = fuidp ? fuidp->z_fuid_cnt : 0;
|
||||
if (vsecp->vsa_aclflags & VSA_ACE_ACLFLAGS)
|
||||
lracl->lr_acl_flags = (uint64_t)vsecp->vsa_aclflags;
|
||||
else
|
||||
lracl->lr_acl_flags = 0;
|
||||
|
||||
bcopy(vsecp->vsa_aclentp, end, aclsize);
|
||||
end = (caddr_t)end + ZIL_ACE_LENGTH(aclsize);
|
||||
}
|
||||
|
||||
/* drop in FUID info */
|
||||
if (fuidp) {
|
||||
end = zfs_log_fuid_ids(fuidp, end);
|
||||
end = zfs_log_fuid_domains(fuidp, end);
|
||||
}
|
||||
/*
|
||||
* Now place file name in log record
|
||||
*/
|
||||
bcopy(name, end, namesize);
|
||||
|
||||
seq = zil_itx_assign(zilog, itx, tx);
|
||||
dzp->z_last_itx = seq;
|
||||
zp->z_last_itx = seq;
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_remove() handles both TX_REMOVE and TX_RMDIR transactions.
|
||||
*/
|
||||
void
|
||||
zfs_log_remove(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
|
||||
znode_t *dzp, char *name)
|
||||
{
|
||||
itx_t *itx;
|
||||
uint64_t seq;
|
||||
lr_remove_t *lr;
|
||||
size_t namesize = strlen(name) + 1;
|
||||
|
||||
if (zilog == NULL)
|
||||
return;
|
||||
|
||||
itx = zil_itx_create(txtype, sizeof (*lr) + namesize);
|
||||
lr = (lr_remove_t *)&itx->itx_lr;
|
||||
lr->lr_doid = dzp->z_id;
|
||||
bcopy(name, (char *)(lr + 1), namesize);
|
||||
|
||||
seq = zil_itx_assign(zilog, itx, tx);
|
||||
dzp->z_last_itx = seq;
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_link() handles TX_LINK transactions.
|
||||
*/
|
||||
void
|
||||
zfs_log_link(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
|
||||
znode_t *dzp, znode_t *zp, char *name)
|
||||
{
|
||||
itx_t *itx;
|
||||
uint64_t seq;
|
||||
lr_link_t *lr;
|
||||
size_t namesize = strlen(name) + 1;
|
||||
|
||||
if (zilog == NULL)
|
||||
return;
|
||||
|
||||
itx = zil_itx_create(txtype, sizeof (*lr) + namesize);
|
||||
lr = (lr_link_t *)&itx->itx_lr;
|
||||
lr->lr_doid = dzp->z_id;
|
||||
lr->lr_link_obj = zp->z_id;
|
||||
bcopy(name, (char *)(lr + 1), namesize);
|
||||
|
||||
seq = zil_itx_assign(zilog, itx, tx);
|
||||
dzp->z_last_itx = seq;
|
||||
zp->z_last_itx = seq;
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_symlink() handles TX_SYMLINK transactions.
|
||||
*/
|
||||
void
|
||||
zfs_log_symlink(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
|
||||
znode_t *dzp, znode_t *zp, char *name, char *link)
|
||||
{
|
||||
itx_t *itx;
|
||||
uint64_t seq;
|
||||
lr_create_t *lr;
|
||||
size_t namesize = strlen(name) + 1;
|
||||
size_t linksize = strlen(link) + 1;
|
||||
|
||||
if (zilog == NULL)
|
||||
return;
|
||||
|
||||
itx = zil_itx_create(txtype, sizeof (*lr) + namesize + linksize);
|
||||
lr = (lr_create_t *)&itx->itx_lr;
|
||||
lr->lr_doid = dzp->z_id;
|
||||
lr->lr_foid = zp->z_id;
|
||||
lr->lr_mode = zp->z_phys->zp_mode;
|
||||
lr->lr_uid = zp->z_phys->zp_uid;
|
||||
lr->lr_gid = zp->z_phys->zp_gid;
|
||||
lr->lr_gen = zp->z_phys->zp_gen;
|
||||
lr->lr_crtime[0] = zp->z_phys->zp_crtime[0];
|
||||
lr->lr_crtime[1] = zp->z_phys->zp_crtime[1];
|
||||
bcopy(name, (char *)(lr + 1), namesize);
|
||||
bcopy(link, (char *)(lr + 1) + namesize, linksize);
|
||||
|
||||
seq = zil_itx_assign(zilog, itx, tx);
|
||||
dzp->z_last_itx = seq;
|
||||
zp->z_last_itx = seq;
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_rename() handles TX_RENAME transactions.
|
||||
*/
|
||||
void
|
||||
zfs_log_rename(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
|
||||
znode_t *sdzp, char *sname, znode_t *tdzp, char *dname, znode_t *szp)
|
||||
{
|
||||
itx_t *itx;
|
||||
uint64_t seq;
|
||||
lr_rename_t *lr;
|
||||
size_t snamesize = strlen(sname) + 1;
|
||||
size_t dnamesize = strlen(dname) + 1;
|
||||
|
||||
if (zilog == NULL)
|
||||
return;
|
||||
|
||||
itx = zil_itx_create(txtype, sizeof (*lr) + snamesize + dnamesize);
|
||||
lr = (lr_rename_t *)&itx->itx_lr;
|
||||
lr->lr_sdoid = sdzp->z_id;
|
||||
lr->lr_tdoid = tdzp->z_id;
|
||||
bcopy(sname, (char *)(lr + 1), snamesize);
|
||||
bcopy(dname, (char *)(lr + 1) + snamesize, dnamesize);
|
||||
|
||||
seq = zil_itx_assign(zilog, itx, tx);
|
||||
sdzp->z_last_itx = seq;
|
||||
tdzp->z_last_itx = seq;
|
||||
szp->z_last_itx = seq;
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_write() handles TX_WRITE transactions.
|
||||
*/
|
||||
ssize_t zfs_immediate_write_sz = 32768;
|
||||
|
||||
#define ZIL_MAX_LOG_DATA (SPA_MAXBLOCKSIZE - sizeof (zil_trailer_t) - \
|
||||
sizeof (lr_write_t))
|
||||
|
||||
void
|
||||
zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
|
||||
znode_t *zp, offset_t off, ssize_t resid, int ioflag)
|
||||
{
|
||||
itx_wr_state_t write_state;
|
||||
boolean_t slogging;
|
||||
uintptr_t fsync_cnt;
|
||||
|
||||
if (zilog == NULL || zp->z_unlinked)
|
||||
return;
|
||||
|
||||
/*
|
||||
* Writes are handled in three different ways:
|
||||
*
|
||||
* WR_INDIRECT:
|
||||
* If the write is greater than zfs_immediate_write_sz and there are
|
||||
* no separate logs in this pool then later *if* we need to log the
|
||||
* write then dmu_sync() is used to immediately write the block and
|
||||
* its block pointer is put in the log record.
|
||||
* WR_COPIED:
|
||||
* If we know we'll immediately be committing the
|
||||
* transaction (FSYNC or FDSYNC), the we allocate a larger
|
||||
* log record here for the data and copy the data in.
|
||||
* WR_NEED_COPY:
|
||||
* Otherwise we don't allocate a buffer, and *if* we need to
|
||||
* flush the write later then a buffer is allocated and
|
||||
* we retrieve the data using the dmu.
|
||||
*/
|
||||
slogging = spa_has_slogs(zilog->zl_spa);
|
||||
if (resid > zfs_immediate_write_sz && !slogging)
|
||||
write_state = WR_INDIRECT;
|
||||
else if (ioflag & (FSYNC | FDSYNC))
|
||||
write_state = WR_COPIED;
|
||||
else
|
||||
write_state = WR_NEED_COPY;
|
||||
|
||||
if ((fsync_cnt = (uintptr_t)tsd_get(zfs_fsyncer_key)) != 0) {
|
||||
(void) tsd_set(zfs_fsyncer_key, (void *)(fsync_cnt - 1));
|
||||
}
|
||||
|
||||
while (resid) {
|
||||
itx_t *itx;
|
||||
lr_write_t *lr;
|
||||
ssize_t len;
|
||||
|
||||
/*
|
||||
* If there are slogs and the write would overflow the largest
|
||||
* block, then because we don't want to use the main pool
|
||||
* to dmu_sync, we have to split the write.
|
||||
*/
|
||||
if (slogging && resid > ZIL_MAX_LOG_DATA)
|
||||
len = SPA_MAXBLOCKSIZE >> 1;
|
||||
else
|
||||
len = resid;
|
||||
|
||||
itx = zil_itx_create(txtype, sizeof (*lr) +
|
||||
(write_state == WR_COPIED ? len : 0));
|
||||
lr = (lr_write_t *)&itx->itx_lr;
|
||||
if (write_state == WR_COPIED && dmu_read(zp->z_zfsvfs->z_os,
|
||||
zp->z_id, off, len, lr + 1) != 0) {
|
||||
kmem_free(itx, offsetof(itx_t, itx_lr) +
|
||||
itx->itx_lr.lrc_reclen);
|
||||
itx = zil_itx_create(txtype, sizeof (*lr));
|
||||
lr = (lr_write_t *)&itx->itx_lr;
|
||||
write_state = WR_NEED_COPY;
|
||||
}
|
||||
|
||||
itx->itx_wr_state = write_state;
|
||||
if (write_state == WR_NEED_COPY)
|
||||
itx->itx_sod += len;
|
||||
lr->lr_foid = zp->z_id;
|
||||
lr->lr_offset = off;
|
||||
lr->lr_length = len;
|
||||
lr->lr_blkoff = 0;
|
||||
BP_ZERO(&lr->lr_blkptr);
|
||||
|
||||
itx->itx_private = zp->z_zfsvfs;
|
||||
|
||||
if ((zp->z_sync_cnt != 0) || (fsync_cnt != 0) ||
|
||||
(ioflag & (FSYNC | FDSYNC)))
|
||||
itx->itx_sync = B_TRUE;
|
||||
else
|
||||
itx->itx_sync = B_FALSE;
|
||||
|
||||
zp->z_last_itx = zil_itx_assign(zilog, itx, tx);
|
||||
|
||||
off += len;
|
||||
resid -= len;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_truncate() handles TX_TRUNCATE transactions.
|
||||
*/
|
||||
void
|
||||
zfs_log_truncate(zilog_t *zilog, dmu_tx_t *tx, int txtype,
|
||||
znode_t *zp, uint64_t off, uint64_t len)
|
||||
{
|
||||
itx_t *itx;
|
||||
uint64_t seq;
|
||||
lr_truncate_t *lr;
|
||||
|
||||
if (zilog == NULL || zp->z_unlinked)
|
||||
return;
|
||||
|
||||
itx = zil_itx_create(txtype, sizeof (*lr));
|
||||
lr = (lr_truncate_t *)&itx->itx_lr;
|
||||
lr->lr_foid = zp->z_id;
|
||||
lr->lr_offset = off;
|
||||
lr->lr_length = len;
|
||||
|
||||
itx->itx_sync = (zp->z_sync_cnt != 0);
|
||||
seq = zil_itx_assign(zilog, itx, tx);
|
||||
zp->z_last_itx = seq;
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_setattr() handles TX_SETATTR transactions.
|
||||
*/
|
||||
void
|
||||
zfs_log_setattr(zilog_t *zilog, dmu_tx_t *tx, int txtype,
|
||||
znode_t *zp, vattr_t *vap, uint_t mask_applied, zfs_fuid_info_t *fuidp)
|
||||
{
|
||||
itx_t *itx;
|
||||
uint64_t seq;
|
||||
lr_setattr_t *lr;
|
||||
xvattr_t *xvap = (xvattr_t *)vap;
|
||||
size_t recsize = sizeof (lr_setattr_t);
|
||||
void *start;
|
||||
|
||||
|
||||
if (zilog == NULL || zp->z_unlinked)
|
||||
return;
|
||||
|
||||
/*
|
||||
* If XVATTR set, then log record size needs to allow
|
||||
* for lr_attr_t + xvattr mask, mapsize and create time
|
||||
* plus actual attribute values
|
||||
*/
|
||||
if (vap->va_mask & AT_XVATTR)
|
||||
recsize = sizeof (*lr) + ZIL_XVAT_SIZE(xvap->xva_mapsize);
|
||||
|
||||
if (fuidp)
|
||||
recsize += fuidp->z_domain_str_sz;
|
||||
|
||||
itx = zil_itx_create(txtype, recsize);
|
||||
lr = (lr_setattr_t *)&itx->itx_lr;
|
||||
lr->lr_foid = zp->z_id;
|
||||
lr->lr_mask = (uint64_t)mask_applied;
|
||||
lr->lr_mode = (uint64_t)vap->va_mode;
|
||||
if ((mask_applied & AT_UID) && IS_EPHEMERAL(vap->va_uid))
|
||||
lr->lr_uid = fuidp->z_fuid_owner;
|
||||
else
|
||||
lr->lr_uid = (uint64_t)vap->va_uid;
|
||||
|
||||
if ((mask_applied & AT_GID) && IS_EPHEMERAL(vap->va_gid))
|
||||
lr->lr_gid = fuidp->z_fuid_group;
|
||||
else
|
||||
lr->lr_gid = (uint64_t)vap->va_gid;
|
||||
|
||||
lr->lr_size = (uint64_t)vap->va_size;
|
||||
ZFS_TIME_ENCODE(&vap->va_atime, lr->lr_atime);
|
||||
ZFS_TIME_ENCODE(&vap->va_mtime, lr->lr_mtime);
|
||||
start = (lr_setattr_t *)(lr + 1);
|
||||
if (vap->va_mask & AT_XVATTR) {
|
||||
zfs_log_xvattr((lr_attr_t *)start, xvap);
|
||||
start = (caddr_t)start + ZIL_XVAT_SIZE(xvap->xva_mapsize);
|
||||
}
|
||||
|
||||
/*
|
||||
* Now stick on domain information if any on end
|
||||
*/
|
||||
|
||||
if (fuidp)
|
||||
(void) zfs_log_fuid_domains(fuidp, start);
|
||||
|
||||
itx->itx_sync = (zp->z_sync_cnt != 0);
|
||||
seq = zil_itx_assign(zilog, itx, tx);
|
||||
zp->z_last_itx = seq;
|
||||
}
|
||||
|
||||
/*
|
||||
* zfs_log_acl() handles TX_ACL transactions.
|
||||
*/
|
||||
void
|
||||
zfs_log_acl(zilog_t *zilog, dmu_tx_t *tx, znode_t *zp,
|
||||
vsecattr_t *vsecp, zfs_fuid_info_t *fuidp)
|
||||
{
|
||||
itx_t *itx;
|
||||
uint64_t seq;
|
||||
lr_acl_v0_t *lrv0;
|
||||
lr_acl_t *lr;
|
||||
int txtype;
|
||||
int lrsize;
|
||||
size_t txsize;
|
||||
size_t aclbytes = vsecp->vsa_aclentsz;
|
||||
|
||||
txtype = (zp->z_zfsvfs->z_version == ZPL_VERSION_INITIAL) ?
|
||||
TX_ACL_V0 : TX_ACL;
|
||||
|
||||
if (txtype == TX_ACL)
|
||||
lrsize = sizeof (*lr);
|
||||
else
|
||||
lrsize = sizeof (*lrv0);
|
||||
|
||||
if (zilog == NULL || zp->z_unlinked)
|
||||
return;
|
||||
|
||||
txsize = lrsize +
|
||||
((txtype == TX_ACL) ? ZIL_ACE_LENGTH(aclbytes) : aclbytes) +
|
||||
(fuidp ? fuidp->z_domain_str_sz : 0) +
|
||||
sizeof (uint64) * (fuidp ? fuidp->z_fuid_cnt : 0);
|
||||
|
||||
itx = zil_itx_create(txtype, txsize);
|
||||
|
||||
lr = (lr_acl_t *)&itx->itx_lr;
|
||||
lr->lr_foid = zp->z_id;
|
||||
if (txtype == TX_ACL) {
|
||||
lr->lr_acl_bytes = aclbytes;
|
||||
lr->lr_domcnt = fuidp ? fuidp->z_domain_cnt : 0;
|
||||
lr->lr_fuidcnt = fuidp ? fuidp->z_fuid_cnt : 0;
|
||||
if (vsecp->vsa_mask & VSA_ACE_ACLFLAGS)
|
||||
lr->lr_acl_flags = (uint64_t)vsecp->vsa_aclflags;
|
||||
else
|
||||
lr->lr_acl_flags = 0;
|
||||
}
|
||||
lr->lr_aclcnt = (uint64_t)vsecp->vsa_aclcnt;
|
||||
|
||||
if (txtype == TX_ACL_V0) {
|
||||
lrv0 = (lr_acl_v0_t *)lr;
|
||||
bcopy(vsecp->vsa_aclentp, (ace_t *)(lrv0 + 1), aclbytes);
|
||||
} else {
|
||||
void *start = (ace_t *)(lr + 1);
|
||||
|
||||
bcopy(vsecp->vsa_aclentp, start, aclbytes);
|
||||
|
||||
start = (caddr_t)start + ZIL_ACE_LENGTH(aclbytes);
|
||||
|
||||
if (fuidp) {
|
||||
start = zfs_log_fuid_ids(fuidp, start);
|
||||
(void) zfs_log_fuid_domains(fuidp, start);
|
||||
}
|
||||
}
|
||||
|
||||
itx->itx_sync = (zp->z_sync_cnt != 0);
|
||||
seq = zil_itx_assign(zilog, itx, tx);
|
||||
zp->z_last_itx = seq;
|
||||
}
|
|
@ -0,0 +1,876 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "@(#)zfs_replay.c 1.7 08/01/14 SMI"
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/param.h>
|
||||
#include <sys/systm.h>
|
||||
#include <sys/sysmacros.h>
|
||||
#include <sys/cmn_err.h>
|
||||
#include <sys/kmem.h>
|
||||
#include <sys/thread.h>
|
||||
#include <sys/file.h>
|
||||
#include <sys/fcntl.h>
|
||||
#include <sys/vfs.h>
|
||||
#include <sys/fs/zfs.h>
|
||||
#include <sys/zfs_znode.h>
|
||||
#include <sys/zfs_dir.h>
|
||||
#include <sys/zfs_acl.h>
|
||||
#include <sys/zfs_fuid.h>
|
||||
#include <sys/spa.h>
|
||||
#include <sys/zil.h>
|
||||
#include <sys/byteorder.h>
|
||||
#include <sys/stat.h>
|
||||
#include <sys/mode.h>
|
||||
#include <sys/acl.h>
|
||||
#include <sys/atomic.h>
|
||||
#include <sys/cred.h>
|
||||
|
||||
/*
|
||||
* Functions to replay ZFS intent log (ZIL) records
|
||||
* The functions are called through a function vector (zfs_replay_vector)
|
||||
* which is indexed by the transaction type.
|
||||
*/
|
||||
|
||||
static void
|
||||
zfs_init_vattr(vattr_t *vap, uint64_t mask, uint64_t mode,
|
||||
uint64_t uid, uint64_t gid, uint64_t rdev, uint64_t nodeid)
|
||||
{
|
||||
bzero(vap, sizeof (*vap));
|
||||
vap->va_mask = (uint_t)mask;
|
||||
vap->va_type = IFTOVT(mode);
|
||||
vap->va_mode = mode & MODEMASK;
|
||||
vap->va_uid = (uid_t)(IS_EPHEMERAL(uid)) ? -1 : uid;
|
||||
vap->va_gid = (gid_t)(IS_EPHEMERAL(gid)) ? -1 : gid;
|
||||
vap->va_rdev = zfs_cmpldev(rdev);
|
||||
vap->va_nodeid = nodeid;
|
||||
}
|
||||
|
||||
/* ARGSUSED */
|
||||
static int
|
||||
zfs_replay_error(zfsvfs_t *zfsvfs, lr_t *lr, boolean_t byteswap)
|
||||
{
|
||||
return (ENOTSUP);
|
||||
}
|
||||
|
||||
static void
|
||||
zfs_replay_xvattr(lr_attr_t *lrattr, xvattr_t *xvap)
|
||||
{
|
||||
xoptattr_t *xoap = NULL;
|
||||
uint64_t *attrs;
|
||||
uint64_t *crtime;
|
||||
uint32_t *bitmap;
|
||||
void *scanstamp;
|
||||
int i;
|
||||
|
||||
xvap->xva_vattr.va_mask |= AT_XVATTR;
|
||||
if ((xoap = xva_getxoptattr(xvap)) == NULL) {
|
||||
xvap->xva_vattr.va_mask &= ~AT_XVATTR; /* shouldn't happen */
|
||||
return;
|
||||
}
|
||||
|
||||
ASSERT(lrattr->lr_attr_masksize == xvap->xva_mapsize);
|
||||
|
||||
bitmap = &lrattr->lr_attr_bitmap;
|
||||
for (i = 0; i != lrattr->lr_attr_masksize; i++, bitmap++)
|
||||
xvap->xva_reqattrmap[i] = *bitmap;
|
||||
|
||||
attrs = (uint64_t *)(lrattr + lrattr->lr_attr_masksize - 1);
|
||||
crtime = attrs + 1;
|
||||
scanstamp = (caddr_t)(crtime + 2);
|
||||
|
||||
if (XVA_ISSET_REQ(xvap, XAT_HIDDEN))
|
||||
xoap->xoa_hidden = ((*attrs & XAT0_HIDDEN) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_SYSTEM))
|
||||
xoap->xoa_system = ((*attrs & XAT0_SYSTEM) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_ARCHIVE))
|
||||
xoap->xoa_archive = ((*attrs & XAT0_ARCHIVE) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_READONLY))
|
||||
xoap->xoa_readonly = ((*attrs & XAT0_READONLY) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_IMMUTABLE))
|
||||
xoap->xoa_immutable = ((*attrs & XAT0_IMMUTABLE) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_NOUNLINK))
|
||||
xoap->xoa_nounlink = ((*attrs & XAT0_NOUNLINK) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_APPENDONLY))
|
||||
xoap->xoa_appendonly = ((*attrs & XAT0_APPENDONLY) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_NODUMP))
|
||||
xoap->xoa_nodump = ((*attrs & XAT0_NODUMP) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_OPAQUE))
|
||||
xoap->xoa_opaque = ((*attrs & XAT0_OPAQUE) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_AV_MODIFIED))
|
||||
xoap->xoa_av_modified = ((*attrs & XAT0_AV_MODIFIED) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_AV_QUARANTINED))
|
||||
xoap->xoa_av_quarantined =
|
||||
((*attrs & XAT0_AV_QUARANTINED) != 0);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_CREATETIME))
|
||||
ZFS_TIME_DECODE(&xoap->xoa_createtime, crtime);
|
||||
if (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP))
|
||||
bcopy(scanstamp, xoap->xoa_av_scanstamp, AV_SCANSTAMP_SZ);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_domain_cnt(uint64_t uid, uint64_t gid)
|
||||
{
|
||||
uint64_t uid_idx;
|
||||
uint64_t gid_idx;
|
||||
int domcnt = 0;
|
||||
|
||||
uid_idx = FUID_INDEX(uid);
|
||||
gid_idx = FUID_INDEX(gid);
|
||||
if (uid_idx)
|
||||
domcnt++;
|
||||
if (gid_idx > 0 && gid_idx != uid_idx)
|
||||
domcnt++;
|
||||
|
||||
return (domcnt);
|
||||
}
|
||||
|
||||
static void *
|
||||
zfs_replay_fuid_domain_common(zfs_fuid_info_t *fuid_infop, void *start,
|
||||
int domcnt)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i != domcnt; i++) {
|
||||
fuid_infop->z_domain_table[i] = start;
|
||||
start = (caddr_t)start + strlen(start) + 1;
|
||||
}
|
||||
|
||||
return (start);
|
||||
}
|
||||
|
||||
/*
|
||||
* Set the uid/gid in the fuid_info structure.
|
||||
*/
|
||||
static void
|
||||
zfs_replay_fuid_ugid(zfs_fuid_info_t *fuid_infop, uint64_t uid, uint64_t gid)
|
||||
{
|
||||
/*
|
||||
* If owner or group are log specific FUIDs then slurp up
|
||||
* domain information and build zfs_fuid_info_t
|
||||
*/
|
||||
if (IS_EPHEMERAL(uid))
|
||||
fuid_infop->z_fuid_owner = uid;
|
||||
|
||||
if (IS_EPHEMERAL(gid))
|
||||
fuid_infop->z_fuid_group = gid;
|
||||
}
|
||||
|
||||
/*
|
||||
* Load fuid domains into fuid_info_t
|
||||
*/
|
||||
static zfs_fuid_info_t *
|
||||
zfs_replay_fuid_domain(void *buf, void **end, uint64_t uid, uint64_t gid)
|
||||
{
|
||||
int domcnt;
|
||||
|
||||
zfs_fuid_info_t *fuid_infop;
|
||||
|
||||
fuid_infop = zfs_fuid_info_alloc();
|
||||
|
||||
domcnt = zfs_replay_domain_cnt(uid, gid);
|
||||
|
||||
if (domcnt == 0)
|
||||
return (fuid_infop);
|
||||
|
||||
fuid_infop->z_domain_table =
|
||||
kmem_zalloc(domcnt * sizeof (char **), KM_SLEEP);
|
||||
|
||||
zfs_replay_fuid_ugid(fuid_infop, uid, gid);
|
||||
|
||||
fuid_infop->z_domain_cnt = domcnt;
|
||||
*end = zfs_replay_fuid_domain_common(fuid_infop, buf, domcnt);
|
||||
return (fuid_infop);
|
||||
}
|
||||
|
||||
/*
|
||||
* load zfs_fuid_t's and fuid_domains into fuid_info_t
|
||||
*/
|
||||
static zfs_fuid_info_t *
|
||||
zfs_replay_fuids(void *start, void **end, int idcnt, int domcnt, uint64_t uid,
|
||||
uint64_t gid)
|
||||
{
|
||||
uint64_t *log_fuid = (uint64_t *)start;
|
||||
zfs_fuid_info_t *fuid_infop;
|
||||
int i;
|
||||
|
||||
fuid_infop = zfs_fuid_info_alloc();
|
||||
fuid_infop->z_domain_cnt = domcnt;
|
||||
|
||||
fuid_infop->z_domain_table =
|
||||
kmem_zalloc(domcnt * sizeof (char **), KM_SLEEP);
|
||||
|
||||
for (i = 0; i != idcnt; i++) {
|
||||
zfs_fuid_t *zfuid;
|
||||
|
||||
zfuid = kmem_alloc(sizeof (zfs_fuid_t), KM_SLEEP);
|
||||
zfuid->z_logfuid = *log_fuid;
|
||||
zfuid->z_id = -1;
|
||||
zfuid->z_domidx = 0;
|
||||
list_insert_tail(&fuid_infop->z_fuids, zfuid);
|
||||
log_fuid++;
|
||||
}
|
||||
|
||||
zfs_replay_fuid_ugid(fuid_infop, uid, gid);
|
||||
|
||||
*end = zfs_replay_fuid_domain_common(fuid_infop, log_fuid, domcnt);
|
||||
return (fuid_infop);
|
||||
}
|
||||
|
||||
static void
|
||||
zfs_replay_swap_attrs(lr_attr_t *lrattr)
|
||||
{
|
||||
/* swap the lr_attr structure */
|
||||
byteswap_uint32_array(lrattr, sizeof (*lrattr));
|
||||
/* swap the bitmap */
|
||||
byteswap_uint32_array(lrattr + 1, (lrattr->lr_attr_masksize - 1) *
|
||||
sizeof (uint32_t));
|
||||
/* swap the attributes, create time + 64 bit word for attributes */
|
||||
byteswap_uint64_array((caddr_t)(lrattr + 1) + (sizeof (uint32_t) *
|
||||
(lrattr->lr_attr_masksize - 1)), 3 * sizeof (uint64_t));
|
||||
}
|
||||
|
||||
/*
|
||||
* Replay file create with optional ACL, xvattr information as well
|
||||
* as option FUID information.
|
||||
*/
|
||||
static int
|
||||
zfs_replay_create_acl(zfsvfs_t *zfsvfs,
|
||||
lr_acl_create_t *lracl, boolean_t byteswap)
|
||||
{
|
||||
char *name = NULL; /* location determined later */
|
||||
lr_create_t *lr = (lr_create_t *)lracl;
|
||||
znode_t *dzp;
|
||||
vnode_t *vp = NULL;
|
||||
xvattr_t xva;
|
||||
int vflg = 0;
|
||||
vsecattr_t vsec = { 0 };
|
||||
lr_attr_t *lrattr;
|
||||
void *aclstart;
|
||||
void *fuidstart;
|
||||
size_t xvatlen = 0;
|
||||
uint64_t txtype;
|
||||
int error;
|
||||
|
||||
if (byteswap) {
|
||||
byteswap_uint64_array(lracl, sizeof (*lracl));
|
||||
txtype = (int)lr->lr_common.lrc_txtype;
|
||||
if (txtype == TX_CREATE_ACL_ATTR ||
|
||||
txtype == TX_MKDIR_ACL_ATTR) {
|
||||
lrattr = (lr_attr_t *)(caddr_t)(lracl + 1);
|
||||
zfs_replay_swap_attrs(lrattr);
|
||||
xvatlen = ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
|
||||
}
|
||||
|
||||
aclstart = (caddr_t)(lracl + 1) + xvatlen;
|
||||
zfs_ace_byteswap(aclstart, lracl->lr_acl_bytes, B_FALSE);
|
||||
/* swap fuids */
|
||||
if (lracl->lr_fuidcnt) {
|
||||
byteswap_uint64_array((caddr_t)aclstart +
|
||||
ZIL_ACE_LENGTH(lracl->lr_acl_bytes),
|
||||
lracl->lr_fuidcnt * sizeof (uint64_t));
|
||||
}
|
||||
}
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_doid, &dzp)) != 0)
|
||||
return (error);
|
||||
|
||||
xva_init(&xva);
|
||||
zfs_init_vattr(&xva.xva_vattr, AT_TYPE | AT_MODE | AT_UID | AT_GID,
|
||||
lr->lr_mode, lr->lr_uid, lr->lr_gid, lr->lr_rdev, lr->lr_foid);
|
||||
|
||||
/*
|
||||
* All forms of zfs create (create, mkdir, mkxattrdir, symlink)
|
||||
* eventually end up in zfs_mknode(), which assigns the object's
|
||||
* creation time and generation number. The generic VOP_CREATE()
|
||||
* doesn't have either concept, so we smuggle the values inside
|
||||
* the vattr's otherwise unused va_ctime and va_nblocks fields.
|
||||
*/
|
||||
ZFS_TIME_DECODE(&xva.xva_vattr.va_ctime, lr->lr_crtime);
|
||||
xva.xva_vattr.va_nblocks = lr->lr_gen;
|
||||
|
||||
error = dmu_object_info(zfsvfs->z_os, lr->lr_foid, NULL);
|
||||
if (error != ENOENT)
|
||||
goto bail;
|
||||
|
||||
if (lr->lr_common.lrc_txtype & TX_CI)
|
||||
vflg |= FIGNORECASE;
|
||||
switch ((int)lr->lr_common.lrc_txtype) {
|
||||
case TX_CREATE_ACL:
|
||||
aclstart = (caddr_t)(lracl + 1);
|
||||
fuidstart = (caddr_t)aclstart +
|
||||
ZIL_ACE_LENGTH(lracl->lr_acl_bytes);
|
||||
zfsvfs->z_fuid_replay = zfs_replay_fuids(fuidstart,
|
||||
(void *)&name, lracl->lr_fuidcnt, lracl->lr_domcnt,
|
||||
lr->lr_uid, lr->lr_gid);
|
||||
/*FALLTHROUGH*/
|
||||
case TX_CREATE_ACL_ATTR:
|
||||
if (name == NULL) {
|
||||
lrattr = (lr_attr_t *)(caddr_t)(lracl + 1);
|
||||
xvatlen = ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
|
||||
xva.xva_vattr.va_mask |= AT_XVATTR;
|
||||
zfs_replay_xvattr(lrattr, &xva);
|
||||
}
|
||||
vsec.vsa_mask = VSA_ACE | VSA_ACE_ACLFLAGS;
|
||||
vsec.vsa_aclentp = (caddr_t)(lracl + 1) + xvatlen;
|
||||
vsec.vsa_aclcnt = lracl->lr_aclcnt;
|
||||
vsec.vsa_aclentsz = lracl->lr_acl_bytes;
|
||||
vsec.vsa_aclflags = lracl->lr_acl_flags;
|
||||
if (zfsvfs->z_fuid_replay == NULL) {
|
||||
fuidstart = (caddr_t)(lracl + 1) + xvatlen +
|
||||
ZIL_ACE_LENGTH(lracl->lr_acl_bytes);
|
||||
zfsvfs->z_fuid_replay =
|
||||
zfs_replay_fuids(fuidstart,
|
||||
(void *)&name, lracl->lr_fuidcnt, lracl->lr_domcnt,
|
||||
lr->lr_uid, lr->lr_gid);
|
||||
}
|
||||
|
||||
error = VOP_CREATE(ZTOV(dzp), name, &xva.xva_vattr,
|
||||
0, 0, &vp, kcred, vflg, NULL, &vsec);
|
||||
break;
|
||||
case TX_MKDIR_ACL:
|
||||
aclstart = (caddr_t)(lracl + 1);
|
||||
fuidstart = (caddr_t)aclstart +
|
||||
ZIL_ACE_LENGTH(lracl->lr_acl_bytes);
|
||||
zfsvfs->z_fuid_replay = zfs_replay_fuids(fuidstart,
|
||||
(void *)&name, lracl->lr_fuidcnt, lracl->lr_domcnt,
|
||||
lr->lr_uid, lr->lr_gid);
|
||||
/*FALLTHROUGH*/
|
||||
case TX_MKDIR_ACL_ATTR:
|
||||
if (name == NULL) {
|
||||
lrattr = (lr_attr_t *)(caddr_t)(lracl + 1);
|
||||
xvatlen = ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
|
||||
zfs_replay_xvattr(lrattr, &xva);
|
||||
}
|
||||
vsec.vsa_mask = VSA_ACE | VSA_ACE_ACLFLAGS;
|
||||
vsec.vsa_aclentp = (caddr_t)(lracl + 1) + xvatlen;
|
||||
vsec.vsa_aclcnt = lracl->lr_aclcnt;
|
||||
vsec.vsa_aclentsz = lracl->lr_acl_bytes;
|
||||
vsec.vsa_aclflags = lracl->lr_acl_flags;
|
||||
if (zfsvfs->z_fuid_replay == NULL) {
|
||||
fuidstart = (caddr_t)(lracl + 1) + xvatlen +
|
||||
ZIL_ACE_LENGTH(lracl->lr_acl_bytes);
|
||||
zfsvfs->z_fuid_replay =
|
||||
zfs_replay_fuids(fuidstart,
|
||||
(void *)&name, lracl->lr_fuidcnt, lracl->lr_domcnt,
|
||||
lr->lr_uid, lr->lr_gid);
|
||||
}
|
||||
error = VOP_MKDIR(ZTOV(dzp), name, &xva.xva_vattr,
|
||||
&vp, kcred, NULL, vflg, &vsec);
|
||||
break;
|
||||
default:
|
||||
error = ENOTSUP;
|
||||
}
|
||||
|
||||
bail:
|
||||
if (error == 0 && vp != NULL)
|
||||
VN_RELE(vp);
|
||||
|
||||
VN_RELE(ZTOV(dzp));
|
||||
|
||||
zfs_fuid_info_free(zfsvfs->z_fuid_replay);
|
||||
zfsvfs->z_fuid_replay = NULL;
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_create(zfsvfs_t *zfsvfs, lr_create_t *lr, boolean_t byteswap)
|
||||
{
|
||||
char *name = NULL; /* location determined later */
|
||||
char *link; /* symlink content follows name */
|
||||
znode_t *dzp;
|
||||
vnode_t *vp = NULL;
|
||||
xvattr_t xva;
|
||||
int vflg = 0;
|
||||
size_t lrsize = sizeof (lr_create_t);
|
||||
lr_attr_t *lrattr;
|
||||
void *start;
|
||||
size_t xvatlen;
|
||||
uint64_t txtype;
|
||||
int error;
|
||||
|
||||
if (byteswap) {
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
txtype = (int)lr->lr_common.lrc_txtype;
|
||||
if (txtype == TX_CREATE_ATTR || txtype == TX_MKDIR_ATTR)
|
||||
zfs_replay_swap_attrs((lr_attr_t *)(lr + 1));
|
||||
}
|
||||
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_doid, &dzp)) != 0)
|
||||
return (error);
|
||||
|
||||
xva_init(&xva);
|
||||
zfs_init_vattr(&xva.xva_vattr, AT_TYPE | AT_MODE | AT_UID | AT_GID,
|
||||
lr->lr_mode, lr->lr_uid, lr->lr_gid, lr->lr_rdev, lr->lr_foid);
|
||||
|
||||
/*
|
||||
* All forms of zfs create (create, mkdir, mkxattrdir, symlink)
|
||||
* eventually end up in zfs_mknode(), which assigns the object's
|
||||
* creation time and generation number. The generic VOP_CREATE()
|
||||
* doesn't have either concept, so we smuggle the values inside
|
||||
* the vattr's otherwise unused va_ctime and va_nblocks fields.
|
||||
*/
|
||||
ZFS_TIME_DECODE(&xva.xva_vattr.va_ctime, lr->lr_crtime);
|
||||
xva.xva_vattr.va_nblocks = lr->lr_gen;
|
||||
|
||||
error = dmu_object_info(zfsvfs->z_os, lr->lr_foid, NULL);
|
||||
if (error != ENOENT)
|
||||
goto out;
|
||||
|
||||
if (lr->lr_common.lrc_txtype & TX_CI)
|
||||
vflg |= FIGNORECASE;
|
||||
|
||||
/*
|
||||
* Symlinks don't have fuid info, and CIFS never creates
|
||||
* symlinks.
|
||||
*
|
||||
* The _ATTR versions will grab the fuid info in their subcases.
|
||||
*/
|
||||
if ((int)lr->lr_common.lrc_txtype != TX_SYMLINK &&
|
||||
(int)lr->lr_common.lrc_txtype != TX_MKDIR_ATTR &&
|
||||
(int)lr->lr_common.lrc_txtype != TX_CREATE_ATTR) {
|
||||
start = (lr + 1);
|
||||
zfsvfs->z_fuid_replay =
|
||||
zfs_replay_fuid_domain(start, &start,
|
||||
lr->lr_uid, lr->lr_gid);
|
||||
}
|
||||
|
||||
switch ((int)lr->lr_common.lrc_txtype) {
|
||||
case TX_CREATE_ATTR:
|
||||
lrattr = (lr_attr_t *)(caddr_t)(lr + 1);
|
||||
xvatlen = ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
|
||||
zfs_replay_xvattr((lr_attr_t *)((caddr_t)lr + lrsize), &xva);
|
||||
start = (caddr_t)(lr + 1) + xvatlen;
|
||||
zfsvfs->z_fuid_replay =
|
||||
zfs_replay_fuid_domain(start, &start,
|
||||
lr->lr_uid, lr->lr_gid);
|
||||
name = (char *)start;
|
||||
|
||||
/*FALLTHROUGH*/
|
||||
case TX_CREATE:
|
||||
if (name == NULL)
|
||||
name = (char *)start;
|
||||
|
||||
error = VOP_CREATE(ZTOV(dzp), name, &xva.xva_vattr,
|
||||
0, 0, &vp, kcred, vflg, NULL, NULL);
|
||||
break;
|
||||
case TX_MKDIR_ATTR:
|
||||
lrattr = (lr_attr_t *)(caddr_t)(lr + 1);
|
||||
xvatlen = ZIL_XVAT_SIZE(lrattr->lr_attr_masksize);
|
||||
zfs_replay_xvattr((lr_attr_t *)((caddr_t)lr + lrsize), &xva);
|
||||
start = (caddr_t)(lr + 1) + xvatlen;
|
||||
zfsvfs->z_fuid_replay =
|
||||
zfs_replay_fuid_domain(start, &start,
|
||||
lr->lr_uid, lr->lr_gid);
|
||||
name = (char *)start;
|
||||
|
||||
/*FALLTHROUGH*/
|
||||
case TX_MKDIR:
|
||||
if (name == NULL)
|
||||
name = (char *)(lr + 1);
|
||||
|
||||
error = VOP_MKDIR(ZTOV(dzp), name, &xva.xva_vattr,
|
||||
&vp, kcred, NULL, vflg, NULL);
|
||||
break;
|
||||
case TX_MKXATTR:
|
||||
name = (char *)(lr + 1);
|
||||
error = zfs_make_xattrdir(dzp, &xva.xva_vattr, &vp, kcred);
|
||||
break;
|
||||
case TX_SYMLINK:
|
||||
name = (char *)(lr + 1);
|
||||
link = name + strlen(name) + 1;
|
||||
error = VOP_SYMLINK(ZTOV(dzp), name, &xva.xva_vattr,
|
||||
link, kcred, NULL, vflg);
|
||||
break;
|
||||
default:
|
||||
error = ENOTSUP;
|
||||
}
|
||||
|
||||
out:
|
||||
if (error == 0 && vp != NULL)
|
||||
VN_RELE(vp);
|
||||
|
||||
VN_RELE(ZTOV(dzp));
|
||||
|
||||
if (zfsvfs->z_fuid_replay)
|
||||
zfs_fuid_info_free(zfsvfs->z_fuid_replay);
|
||||
zfsvfs->z_fuid_replay = NULL;
|
||||
return (error);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_remove(zfsvfs_t *zfsvfs, lr_remove_t *lr, boolean_t byteswap)
|
||||
{
|
||||
char *name = (char *)(lr + 1); /* name follows lr_remove_t */
|
||||
znode_t *dzp;
|
||||
int error;
|
||||
int vflg = 0;
|
||||
|
||||
if (byteswap)
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_doid, &dzp)) != 0)
|
||||
return (error);
|
||||
|
||||
if (lr->lr_common.lrc_txtype & TX_CI)
|
||||
vflg |= FIGNORECASE;
|
||||
|
||||
switch ((int)lr->lr_common.lrc_txtype) {
|
||||
case TX_REMOVE:
|
||||
error = VOP_REMOVE(ZTOV(dzp), name, kcred, NULL, vflg);
|
||||
break;
|
||||
case TX_RMDIR:
|
||||
error = VOP_RMDIR(ZTOV(dzp), name, NULL, kcred, NULL, vflg);
|
||||
break;
|
||||
default:
|
||||
error = ENOTSUP;
|
||||
}
|
||||
|
||||
VN_RELE(ZTOV(dzp));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_link(zfsvfs_t *zfsvfs, lr_link_t *lr, boolean_t byteswap)
|
||||
{
|
||||
char *name = (char *)(lr + 1); /* name follows lr_link_t */
|
||||
znode_t *dzp, *zp;
|
||||
int error;
|
||||
int vflg = 0;
|
||||
|
||||
if (byteswap)
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_doid, &dzp)) != 0)
|
||||
return (error);
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_link_obj, &zp)) != 0) {
|
||||
VN_RELE(ZTOV(dzp));
|
||||
return (error);
|
||||
}
|
||||
|
||||
if (lr->lr_common.lrc_txtype & TX_CI)
|
||||
vflg |= FIGNORECASE;
|
||||
|
||||
error = VOP_LINK(ZTOV(dzp), ZTOV(zp), name, kcred, NULL, vflg);
|
||||
|
||||
VN_RELE(ZTOV(zp));
|
||||
VN_RELE(ZTOV(dzp));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_rename(zfsvfs_t *zfsvfs, lr_rename_t *lr, boolean_t byteswap)
|
||||
{
|
||||
char *sname = (char *)(lr + 1); /* sname and tname follow lr_rename_t */
|
||||
char *tname = sname + strlen(sname) + 1;
|
||||
znode_t *sdzp, *tdzp;
|
||||
int error;
|
||||
int vflg = 0;
|
||||
|
||||
if (byteswap)
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_sdoid, &sdzp)) != 0)
|
||||
return (error);
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_tdoid, &tdzp)) != 0) {
|
||||
VN_RELE(ZTOV(sdzp));
|
||||
return (error);
|
||||
}
|
||||
|
||||
if (lr->lr_common.lrc_txtype & TX_CI)
|
||||
vflg |= FIGNORECASE;
|
||||
|
||||
error = VOP_RENAME(ZTOV(sdzp), sname, ZTOV(tdzp), tname, kcred,
|
||||
NULL, vflg);
|
||||
|
||||
VN_RELE(ZTOV(tdzp));
|
||||
VN_RELE(ZTOV(sdzp));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_write(zfsvfs_t *zfsvfs, lr_write_t *lr, boolean_t byteswap)
|
||||
{
|
||||
char *data = (char *)(lr + 1); /* data follows lr_write_t */
|
||||
znode_t *zp;
|
||||
int error;
|
||||
ssize_t resid;
|
||||
|
||||
if (byteswap)
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_foid, &zp)) != 0) {
|
||||
/*
|
||||
* As we can log writes out of order, it's possible the
|
||||
* file has been removed. In this case just drop the write
|
||||
* and return success.
|
||||
*/
|
||||
if (error == ENOENT)
|
||||
error = 0;
|
||||
return (error);
|
||||
}
|
||||
|
||||
error = vn_rdwr(UIO_WRITE, ZTOV(zp), data, lr->lr_length,
|
||||
lr->lr_offset, UIO_SYSSPACE, 0, RLIM64_INFINITY, kcred, &resid);
|
||||
|
||||
VN_RELE(ZTOV(zp));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_truncate(zfsvfs_t *zfsvfs, lr_truncate_t *lr, boolean_t byteswap)
|
||||
{
|
||||
znode_t *zp;
|
||||
flock64_t fl;
|
||||
int error;
|
||||
|
||||
if (byteswap)
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_foid, &zp)) != 0) {
|
||||
/*
|
||||
* As we can log truncates out of order, it's possible the
|
||||
* file has been removed. In this case just drop the truncate
|
||||
* and return success.
|
||||
*/
|
||||
if (error == ENOENT)
|
||||
error = 0;
|
||||
return (error);
|
||||
}
|
||||
|
||||
bzero(&fl, sizeof (fl));
|
||||
fl.l_type = F_WRLCK;
|
||||
fl.l_whence = 0;
|
||||
fl.l_start = lr->lr_offset;
|
||||
fl.l_len = lr->lr_length;
|
||||
|
||||
error = VOP_SPACE(ZTOV(zp), F_FREESP, &fl, FWRITE | FOFFMAX,
|
||||
lr->lr_offset, kcred, NULL);
|
||||
|
||||
VN_RELE(ZTOV(zp));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_setattr(zfsvfs_t *zfsvfs, lr_setattr_t *lr, boolean_t byteswap)
|
||||
{
|
||||
znode_t *zp;
|
||||
xvattr_t xva;
|
||||
vattr_t *vap = &xva.xva_vattr;
|
||||
int error;
|
||||
void *start;
|
||||
|
||||
xva_init(&xva);
|
||||
if (byteswap) {
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
|
||||
if ((lr->lr_mask & AT_XVATTR) &&
|
||||
zfsvfs->z_version >= ZPL_VERSION_INITIAL)
|
||||
zfs_replay_swap_attrs((lr_attr_t *)(lr + 1));
|
||||
}
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_foid, &zp)) != 0) {
|
||||
/*
|
||||
* As we can log setattrs out of order, it's possible the
|
||||
* file has been removed. In this case just drop the setattr
|
||||
* and return success.
|
||||
*/
|
||||
if (error == ENOENT)
|
||||
error = 0;
|
||||
return (error);
|
||||
}
|
||||
|
||||
zfs_init_vattr(vap, lr->lr_mask, lr->lr_mode,
|
||||
lr->lr_uid, lr->lr_gid, 0, lr->lr_foid);
|
||||
|
||||
vap->va_size = lr->lr_size;
|
||||
ZFS_TIME_DECODE(&vap->va_atime, lr->lr_atime);
|
||||
ZFS_TIME_DECODE(&vap->va_mtime, lr->lr_mtime);
|
||||
|
||||
/*
|
||||
* Fill in xvattr_t portions if necessary.
|
||||
*/
|
||||
|
||||
start = (lr_setattr_t *)(lr + 1);
|
||||
if (vap->va_mask & AT_XVATTR) {
|
||||
zfs_replay_xvattr((lr_attr_t *)start, &xva);
|
||||
start = (caddr_t)start +
|
||||
ZIL_XVAT_SIZE(((lr_attr_t *)start)->lr_attr_masksize);
|
||||
} else
|
||||
xva.xva_vattr.va_mask &= ~AT_XVATTR;
|
||||
|
||||
zfsvfs->z_fuid_replay = zfs_replay_fuid_domain(start, &start,
|
||||
lr->lr_uid, lr->lr_gid);
|
||||
|
||||
error = VOP_SETATTR(ZTOV(zp), vap, 0, kcred, NULL);
|
||||
|
||||
zfs_fuid_info_free(zfsvfs->z_fuid_replay);
|
||||
zfsvfs->z_fuid_replay = NULL;
|
||||
VN_RELE(ZTOV(zp));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
static int
|
||||
zfs_replay_acl_v0(zfsvfs_t *zfsvfs, lr_acl_v0_t *lr, boolean_t byteswap)
|
||||
{
|
||||
ace_t *ace = (ace_t *)(lr + 1); /* ace array follows lr_acl_t */
|
||||
vsecattr_t vsa;
|
||||
znode_t *zp;
|
||||
int error;
|
||||
|
||||
if (byteswap) {
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
zfs_oldace_byteswap(ace, lr->lr_aclcnt);
|
||||
}
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_foid, &zp)) != 0) {
|
||||
/*
|
||||
* As we can log acls out of order, it's possible the
|
||||
* file has been removed. In this case just drop the acl
|
||||
* and return success.
|
||||
*/
|
||||
if (error == ENOENT)
|
||||
error = 0;
|
||||
return (error);
|
||||
}
|
||||
|
||||
bzero(&vsa, sizeof (vsa));
|
||||
vsa.vsa_mask = VSA_ACE | VSA_ACECNT;
|
||||
vsa.vsa_aclcnt = lr->lr_aclcnt;
|
||||
vsa.vsa_aclentp = ace;
|
||||
|
||||
error = VOP_SETSECATTR(ZTOV(zp), &vsa, 0, kcred, NULL);
|
||||
|
||||
VN_RELE(ZTOV(zp));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
/*
|
||||
* Replaying ACLs is complicated by FUID support.
|
||||
* The log record may contain some optional data
|
||||
* to be used for replaying FUID's. These pieces
|
||||
* are the actual FUIDs that were created initially.
|
||||
* The FUID table index may no longer be valid and
|
||||
* during zfs_create() a new index may be assigned.
|
||||
* Because of this the log will contain the original
|
||||
* doman+rid in order to create a new FUID.
|
||||
*
|
||||
* The individual ACEs may contain an ephemeral uid/gid which is no
|
||||
* longer valid and will need to be replaced with an actual FUID.
|
||||
*
|
||||
*/
|
||||
static int
|
||||
zfs_replay_acl(zfsvfs_t *zfsvfs, lr_acl_t *lr, boolean_t byteswap)
|
||||
{
|
||||
ace_t *ace = (ace_t *)(lr + 1);
|
||||
vsecattr_t vsa;
|
||||
znode_t *zp;
|
||||
int error;
|
||||
|
||||
if (byteswap) {
|
||||
byteswap_uint64_array(lr, sizeof (*lr));
|
||||
zfs_ace_byteswap(ace, lr->lr_acl_bytes, B_FALSE);
|
||||
if (lr->lr_fuidcnt) {
|
||||
byteswap_uint64_array((caddr_t)ace +
|
||||
ZIL_ACE_LENGTH(lr->lr_acl_bytes),
|
||||
lr->lr_fuidcnt * sizeof (uint64_t));
|
||||
}
|
||||
}
|
||||
|
||||
if ((error = zfs_zget(zfsvfs, lr->lr_foid, &zp)) != 0) {
|
||||
/*
|
||||
* As we can log acls out of order, it's possible the
|
||||
* file has been removed. In this case just drop the acl
|
||||
* and return success.
|
||||
*/
|
||||
if (error == ENOENT)
|
||||
error = 0;
|
||||
return (error);
|
||||
}
|
||||
|
||||
bzero(&vsa, sizeof (vsa));
|
||||
vsa.vsa_mask = VSA_ACE | VSA_ACECNT | VSA_ACE_ACLFLAGS;
|
||||
vsa.vsa_aclcnt = lr->lr_aclcnt;
|
||||
vsa.vsa_aclentp = ace;
|
||||
vsa.vsa_aclentsz = lr->lr_acl_bytes;
|
||||
vsa.vsa_aclflags = lr->lr_acl_flags;
|
||||
|
||||
if (lr->lr_fuidcnt) {
|
||||
void *fuidstart = (caddr_t)ace +
|
||||
ZIL_ACE_LENGTH(lr->lr_acl_bytes);
|
||||
|
||||
zfsvfs->z_fuid_replay =
|
||||
zfs_replay_fuids(fuidstart, &fuidstart,
|
||||
lr->lr_fuidcnt, lr->lr_domcnt, 0, 0);
|
||||
}
|
||||
|
||||
error = VOP_SETSECATTR(ZTOV(zp), &vsa, 0, kcred, NULL);
|
||||
|
||||
if (zfsvfs->z_fuid_replay)
|
||||
zfs_fuid_info_free(zfsvfs->z_fuid_replay);
|
||||
|
||||
zfsvfs->z_fuid_replay = NULL;
|
||||
VN_RELE(ZTOV(zp));
|
||||
|
||||
return (error);
|
||||
}
|
||||
|
||||
/*
|
||||
* Callback vectors for replaying records
|
||||
*/
|
||||
zil_replay_func_t *zfs_replay_vector[TX_MAX_TYPE] = {
|
||||
zfs_replay_error, /* 0 no such transaction type */
|
||||
zfs_replay_create, /* TX_CREATE */
|
||||
zfs_replay_create, /* TX_MKDIR */
|
||||
zfs_replay_create, /* TX_MKXATTR */
|
||||
zfs_replay_create, /* TX_SYMLINK */
|
||||
zfs_replay_remove, /* TX_REMOVE */
|
||||
zfs_replay_remove, /* TX_RMDIR */
|
||||
zfs_replay_link, /* TX_LINK */
|
||||
zfs_replay_rename, /* TX_RENAME */
|
||||
zfs_replay_write, /* TX_WRITE */
|
||||
zfs_replay_truncate, /* TX_TRUNCATE */
|
||||
zfs_replay_setattr, /* TX_SETATTR */
|
||||
zfs_replay_acl_v0, /* TX_ACL_V0 */
|
||||
zfs_replay_acl, /* TX_ACL */
|
||||
zfs_replay_create_acl, /* TX_CREATE_ACL */
|
||||
zfs_replay_create, /* TX_CREATE_ATTR */
|
||||
zfs_replay_create_acl, /* TX_CREATE_ACL_ATTR */
|
||||
zfs_replay_create_acl, /* TX_MKDIR_ACL */
|
||||
zfs_replay_create, /* TX_MKDIR_ATTR */
|
||||
zfs_replay_create_acl, /* TX_MKDIR_ACL_ATTR */
|
||||
};
|
|
@ -0,0 +1,602 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#pragma ident "@(#)zfs_rlock.c 1.4 07/08/08 SMI"
|
||||
|
||||
/*
|
||||
* This file contains the code to implement file range locking in
|
||||
* ZFS, although there isn't much specific to ZFS (all that comes to mind
|
||||
* support for growing the blocksize).
|
||||
*
|
||||
* Interface
|
||||
* ---------
|
||||
* Defined in zfs_rlock.h but essentially:
|
||||
* rl = zfs_range_lock(zp, off, len, lock_type);
|
||||
* zfs_range_unlock(rl);
|
||||
* zfs_range_reduce(rl, off, len);
|
||||
*
|
||||
* AVL tree
|
||||
* --------
|
||||
* An AVL tree is used to maintain the state of the existing ranges
|
||||
* that are locked for exclusive (writer) or shared (reader) use.
|
||||
* The starting range offset is used for searching and sorting the tree.
|
||||
*
|
||||
* Common case
|
||||
* -----------
|
||||
* The (hopefully) usual case is of no overlaps or contention for
|
||||
* locks. On entry to zfs_lock_range() a rl_t is allocated; the tree
|
||||
* searched that finds no overlap, and *this* rl_t is placed in the tree.
|
||||
*
|
||||
* Overlaps/Reference counting/Proxy locks
|
||||
* ---------------------------------------
|
||||
* The avl code only allows one node at a particular offset. Also it's very
|
||||
* inefficient to search through all previous entries looking for overlaps
|
||||
* (because the very 1st in the ordered list might be at offset 0 but
|
||||
* cover the whole file).
|
||||
* So this implementation uses reference counts and proxy range locks.
|
||||
* Firstly, only reader locks use reference counts and proxy locks,
|
||||
* because writer locks are exclusive.
|
||||
* When a reader lock overlaps with another then a proxy lock is created
|
||||
* for that range and replaces the original lock. If the overlap
|
||||
* is exact then the reference count of the proxy is simply incremented.
|
||||
* Otherwise, the proxy lock is split into smaller lock ranges and
|
||||
* new proxy locks created for non overlapping ranges.
|
||||
* The reference counts are adjusted accordingly.
|
||||
* Meanwhile, the orginal lock is kept around (this is the callers handle)
|
||||
* and its offset and length are used when releasing the lock.
|
||||
*
|
||||
* Thread coordination
|
||||
* -------------------
|
||||
* In order to make wakeups efficient and to ensure multiple continuous
|
||||
* readers on a range don't starve a writer for the same range lock,
|
||||
* two condition variables are allocated in each rl_t.
|
||||
* If a writer (or reader) can't get a range it initialises the writer
|
||||
* (or reader) cv; sets a flag saying there's a writer (or reader) waiting;
|
||||
* and waits on that cv. When a thread unlocks that range it wakes up all
|
||||
* writers then all readers before destroying the lock.
|
||||
*
|
||||
* Append mode writes
|
||||
* ------------------
|
||||
* Append mode writes need to lock a range at the end of a file.
|
||||
* The offset of the end of the file is determined under the
|
||||
* range locking mutex, and the lock type converted from RL_APPEND to
|
||||
* RL_WRITER and the range locked.
|
||||
*
|
||||
* Grow block handling
|
||||
* -------------------
|
||||
* ZFS supports multiple block sizes currently upto 128K. The smallest
|
||||
* block size is used for the file which is grown as needed. During this
|
||||
* growth all other writers and readers must be excluded.
|
||||
* So if the block size needs to be grown then the whole file is
|
||||
* exclusively locked, then later the caller will reduce the lock
|
||||
* range to just the range to be written using zfs_reduce_range.
|
||||
*/
|
||||
|
||||
#include <sys/zfs_rlock.h>
|
||||
|
||||
/*
|
||||
* Check if a write lock can be grabbed, or wait and recheck until available.
|
||||
*/
|
||||
static void
|
||||
zfs_range_lock_writer(znode_t *zp, rl_t *new)
|
||||
{
|
||||
avl_tree_t *tree = &zp->z_range_avl;
|
||||
rl_t *rl;
|
||||
avl_index_t where;
|
||||
uint64_t end_size;
|
||||
uint64_t off = new->r_off;
|
||||
uint64_t len = new->r_len;
|
||||
|
||||
for (;;) {
|
||||
/*
|
||||
* Range locking is also used by zvol and uses a
|
||||
* dummied up znode. However, for zvol, we don't need to
|
||||
* append or grow blocksize, and besides we don't have
|
||||
* a z_phys or z_zfsvfs - so skip that processing.
|
||||
*
|
||||
* Yes, this is ugly, and would be solved by not handling
|
||||
* grow or append in range lock code. If that was done then
|
||||
* we could make the range locking code generically available
|
||||
* to other non-zfs consumers.
|
||||
*/
|
||||
if (zp->z_vnode) { /* caller is ZPL */
|
||||
/*
|
||||
* If in append mode pick up the current end of file.
|
||||
* This is done under z_range_lock to avoid races.
|
||||
*/
|
||||
if (new->r_type == RL_APPEND)
|
||||
new->r_off = zp->z_phys->zp_size;
|
||||
|
||||
/*
|
||||
* If we need to grow the block size then grab the whole
|
||||
* file range. This is also done under z_range_lock to
|
||||
* avoid races.
|
||||
*/
|
||||
end_size = MAX(zp->z_phys->zp_size, new->r_off + len);
|
||||
if (end_size > zp->z_blksz && (!ISP2(zp->z_blksz) ||
|
||||
zp->z_blksz < zp->z_zfsvfs->z_max_blksz)) {
|
||||
new->r_off = 0;
|
||||
new->r_len = UINT64_MAX;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* First check for the usual case of no locks
|
||||
*/
|
||||
if (avl_numnodes(tree) == 0) {
|
||||
new->r_type = RL_WRITER; /* convert to writer */
|
||||
avl_add(tree, new);
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* Look for any locks in the range.
|
||||
*/
|
||||
rl = avl_find(tree, new, &where);
|
||||
if (rl)
|
||||
goto wait; /* already locked at same offset */
|
||||
|
||||
rl = (rl_t *)avl_nearest(tree, where, AVL_AFTER);
|
||||
if (rl && (rl->r_off < new->r_off + new->r_len))
|
||||
goto wait;
|
||||
|
||||
rl = (rl_t *)avl_nearest(tree, where, AVL_BEFORE);
|
||||
if (rl && rl->r_off + rl->r_len > new->r_off)
|
||||
goto wait;
|
||||
|
||||
new->r_type = RL_WRITER; /* convert possible RL_APPEND */
|
||||
avl_insert(tree, new, where);
|
||||
return;
|
||||
wait:
|
||||
if (!rl->r_write_wanted) {
|
||||
cv_init(&rl->r_wr_cv, NULL, CV_DEFAULT, NULL);
|
||||
rl->r_write_wanted = B_TRUE;
|
||||
}
|
||||
cv_wait(&rl->r_wr_cv, &zp->z_range_lock);
|
||||
|
||||
/* reset to original */
|
||||
new->r_off = off;
|
||||
new->r_len = len;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* If this is an original (non-proxy) lock then replace it by
|
||||
* a proxy and return the proxy.
|
||||
*/
|
||||
static rl_t *
|
||||
zfs_range_proxify(avl_tree_t *tree, rl_t *rl)
|
||||
{
|
||||
rl_t *proxy;
|
||||
|
||||
if (rl->r_proxy)
|
||||
return (rl); /* already a proxy */
|
||||
|
||||
ASSERT3U(rl->r_cnt, ==, 1);
|
||||
ASSERT(rl->r_write_wanted == B_FALSE);
|
||||
ASSERT(rl->r_read_wanted == B_FALSE);
|
||||
avl_remove(tree, rl);
|
||||
rl->r_cnt = 0;
|
||||
|
||||
/* create a proxy range lock */
|
||||
proxy = kmem_alloc(sizeof (rl_t), KM_SLEEP);
|
||||
proxy->r_off = rl->r_off;
|
||||
proxy->r_len = rl->r_len;
|
||||
proxy->r_cnt = 1;
|
||||
proxy->r_type = RL_READER;
|
||||
proxy->r_proxy = B_TRUE;
|
||||
proxy->r_write_wanted = B_FALSE;
|
||||
proxy->r_read_wanted = B_FALSE;
|
||||
avl_add(tree, proxy);
|
||||
|
||||
return (proxy);
|
||||
}
|
||||
|
||||
/*
|
||||
* Split the range lock at the supplied offset
|
||||
* returning the *front* proxy.
|
||||
*/
|
||||
static rl_t *
|
||||
zfs_range_split(avl_tree_t *tree, rl_t *rl, uint64_t off)
|
||||
{
|
||||
rl_t *front, *rear;
|
||||
|
||||
ASSERT3U(rl->r_len, >, 1);
|
||||
ASSERT3U(off, >, rl->r_off);
|
||||
ASSERT3U(off, <, rl->r_off + rl->r_len);
|
||||
ASSERT(rl->r_write_wanted == B_FALSE);
|
||||
ASSERT(rl->r_read_wanted == B_FALSE);
|
||||
|
||||
/* create the rear proxy range lock */
|
||||
rear = kmem_alloc(sizeof (rl_t), KM_SLEEP);
|
||||
rear->r_off = off;
|
||||
rear->r_len = rl->r_off + rl->r_len - off;
|
||||
rear->r_cnt = rl->r_cnt;
|
||||
rear->r_type = RL_READER;
|
||||
rear->r_proxy = B_TRUE;
|
||||
rear->r_write_wanted = B_FALSE;
|
||||
rear->r_read_wanted = B_FALSE;
|
||||
|
||||
front = zfs_range_proxify(tree, rl);
|
||||
front->r_len = off - rl->r_off;
|
||||
|
||||
avl_insert_here(tree, rear, front, AVL_AFTER);
|
||||
return (front);
|
||||
}
|
||||
|
||||
/*
|
||||
* Create and add a new proxy range lock for the supplied range.
|
||||
*/
|
||||
static void
|
||||
zfs_range_new_proxy(avl_tree_t *tree, uint64_t off, uint64_t len)
|
||||
{
|
||||
rl_t *rl;
|
||||
|
||||
ASSERT(len);
|
||||
rl = kmem_alloc(sizeof (rl_t), KM_SLEEP);
|
||||
rl->r_off = off;
|
||||
rl->r_len = len;
|
||||
rl->r_cnt = 1;
|
||||
rl->r_type = RL_READER;
|
||||
rl->r_proxy = B_TRUE;
|
||||
rl->r_write_wanted = B_FALSE;
|
||||
rl->r_read_wanted = B_FALSE;
|
||||
avl_add(tree, rl);
|
||||
}
|
||||
|
||||
static void
|
||||
zfs_range_add_reader(avl_tree_t *tree, rl_t *new, rl_t *prev, avl_index_t where)
|
||||
{
|
||||
rl_t *next;
|
||||
uint64_t off = new->r_off;
|
||||
uint64_t len = new->r_len;
|
||||
|
||||
/*
|
||||
* prev arrives either:
|
||||
* - pointing to an entry at the same offset
|
||||
* - pointing to the entry with the closest previous offset whose
|
||||
* range may overlap with the new range
|
||||
* - null, if there were no ranges starting before the new one
|
||||
*/
|
||||
if (prev) {
|
||||
if (prev->r_off + prev->r_len <= off) {
|
||||
prev = NULL;
|
||||
} else if (prev->r_off != off) {
|
||||
/*
|
||||
* convert to proxy if needed then
|
||||
* split this entry and bump ref count
|
||||
*/
|
||||
prev = zfs_range_split(tree, prev, off);
|
||||
prev = AVL_NEXT(tree, prev); /* move to rear range */
|
||||
}
|
||||
}
|
||||
ASSERT((prev == NULL) || (prev->r_off == off));
|
||||
|
||||
if (prev)
|
||||
next = prev;
|
||||
else
|
||||
next = (rl_t *)avl_nearest(tree, where, AVL_AFTER);
|
||||
|
||||
if (next == NULL || off + len <= next->r_off) {
|
||||
/* no overlaps, use the original new rl_t in the tree */
|
||||
avl_insert(tree, new, where);
|
||||
return;
|
||||
}
|
||||
|
||||
if (off < next->r_off) {
|
||||
/* Add a proxy for initial range before the overlap */
|
||||
zfs_range_new_proxy(tree, off, next->r_off - off);
|
||||
}
|
||||
|
||||
new->r_cnt = 0; /* will use proxies in tree */
|
||||
/*
|
||||
* We now search forward through the ranges, until we go past the end
|
||||
* of the new range. For each entry we make it a proxy if it
|
||||
* isn't already, then bump its reference count. If there's any
|
||||
* gaps between the ranges then we create a new proxy range.
|
||||
*/
|
||||
for (prev = NULL; next; prev = next, next = AVL_NEXT(tree, next)) {
|
||||
if (off + len <= next->r_off)
|
||||
break;
|
||||
if (prev && prev->r_off + prev->r_len < next->r_off) {
|
||||
/* there's a gap */
|
||||
ASSERT3U(next->r_off, >, prev->r_off + prev->r_len);
|
||||
zfs_range_new_proxy(tree, prev->r_off + prev->r_len,
|
||||
next->r_off - (prev->r_off + prev->r_len));
|
||||
}
|
||||
if (off + len == next->r_off + next->r_len) {
|
||||
/* exact overlap with end */
|
||||
next = zfs_range_proxify(tree, next);
|
||||
next->r_cnt++;
|
||||
return;
|
||||
}
|
||||
if (off + len < next->r_off + next->r_len) {
|
||||
/* new range ends in the middle of this block */
|
||||
next = zfs_range_split(tree, next, off + len);
|
||||
next->r_cnt++;
|
||||
return;
|
||||
}
|
||||
ASSERT3U(off + len, >, next->r_off + next->r_len);
|
||||
next = zfs_range_proxify(tree, next);
|
||||
next->r_cnt++;
|
||||
}
|
||||
|
||||
/* Add the remaining end range. */
|
||||
zfs_range_new_proxy(tree, prev->r_off + prev->r_len,
|
||||
(off + len) - (prev->r_off + prev->r_len));
|
||||
}
|
||||
|
||||
/*
|
||||
* Check if a reader lock can be grabbed, or wait and recheck until available.
|
||||
*/
|
||||
static void
|
||||
zfs_range_lock_reader(znode_t *zp, rl_t *new)
|
||||
{
|
||||
avl_tree_t *tree = &zp->z_range_avl;
|
||||
rl_t *prev, *next;
|
||||
avl_index_t where;
|
||||
uint64_t off = new->r_off;
|
||||
uint64_t len = new->r_len;
|
||||
|
||||
/*
|
||||
* Look for any writer locks in the range.
|
||||
*/
|
||||
retry:
|
||||
prev = avl_find(tree, new, &where);
|
||||
if (prev == NULL)
|
||||
prev = (rl_t *)avl_nearest(tree, where, AVL_BEFORE);
|
||||
|
||||
/*
|
||||
* Check the previous range for a writer lock overlap.
|
||||
*/
|
||||
if (prev && (off < prev->r_off + prev->r_len)) {
|
||||
if ((prev->r_type == RL_WRITER) || (prev->r_write_wanted)) {
|
||||
if (!prev->r_read_wanted) {
|
||||
cv_init(&prev->r_rd_cv, NULL, CV_DEFAULT, NULL);
|
||||
prev->r_read_wanted = B_TRUE;
|
||||
}
|
||||
cv_wait(&prev->r_rd_cv, &zp->z_range_lock);
|
||||
goto retry;
|
||||
}
|
||||
if (off + len < prev->r_off + prev->r_len)
|
||||
goto got_lock;
|
||||
}
|
||||
|
||||
/*
|
||||
* Search through the following ranges to see if there's
|
||||
* write lock any overlap.
|
||||
*/
|
||||
if (prev)
|
||||
next = AVL_NEXT(tree, prev);
|
||||
else
|
||||
next = (rl_t *)avl_nearest(tree, where, AVL_AFTER);
|
||||
for (; next; next = AVL_NEXT(tree, next)) {
|
||||
if (off + len <= next->r_off)
|
||||
goto got_lock;
|
||||
if ((next->r_type == RL_WRITER) || (next->r_write_wanted)) {
|
||||
if (!next->r_read_wanted) {
|
||||
cv_init(&next->r_rd_cv, NULL, CV_DEFAULT, NULL);
|
||||
next->r_read_wanted = B_TRUE;
|
||||
}
|
||||
cv_wait(&next->r_rd_cv, &zp->z_range_lock);
|
||||
goto retry;
|
||||
}
|
||||
if (off + len <= next->r_off + next->r_len)
|
||||
goto got_lock;
|
||||
}
|
||||
|
||||
got_lock:
|
||||
/*
|
||||
* Add the read lock, which may involve splitting existing
|
||||
* locks and bumping ref counts (r_cnt).
|
||||
*/
|
||||
zfs_range_add_reader(tree, new, prev, where);
|
||||
}
|
||||
|
||||
/*
|
||||
* Lock a range (offset, length) as either shared (RL_READER)
|
||||
* or exclusive (RL_WRITER). Returns the range lock structure
|
||||
* for later unlocking or reduce range (if entire file
|
||||
* previously locked as RL_WRITER).
|
||||
*/
|
||||
rl_t *
|
||||
zfs_range_lock(znode_t *zp, uint64_t off, uint64_t len, rl_type_t type)
|
||||
{
|
||||
rl_t *new;
|
||||
|
||||
ASSERT(type == RL_READER || type == RL_WRITER || type == RL_APPEND);
|
||||
|
||||
new = kmem_alloc(sizeof (rl_t), KM_SLEEP);
|
||||
new->r_zp = zp;
|
||||
new->r_off = off;
|
||||
new->r_len = len;
|
||||
new->r_cnt = 1; /* assume it's going to be in the tree */
|
||||
new->r_type = type;
|
||||
new->r_proxy = B_FALSE;
|
||||
new->r_write_wanted = B_FALSE;
|
||||
new->r_read_wanted = B_FALSE;
|
||||
|
||||
mutex_enter(&zp->z_range_lock);
|
||||
if (type == RL_READER) {
|
||||
/*
|
||||
* First check for the usual case of no locks
|
||||
*/
|
||||
if (avl_numnodes(&zp->z_range_avl) == 0)
|
||||
avl_add(&zp->z_range_avl, new);
|
||||
else
|
||||
zfs_range_lock_reader(zp, new);
|
||||
} else
|
||||
zfs_range_lock_writer(zp, new); /* RL_WRITER or RL_APPEND */
|
||||
mutex_exit(&zp->z_range_lock);
|
||||
return (new);
|
||||
}
|
||||
|
||||
/*
|
||||
* Unlock a reader lock
|
||||
*/
|
||||
static void
|
||||
zfs_range_unlock_reader(znode_t *zp, rl_t *remove)
|
||||
{
|
||||
avl_tree_t *tree = &zp->z_range_avl;
|
||||
rl_t *rl, *next;
|
||||
uint64_t len;
|
||||
|
||||
/*
|
||||
* The common case is when the remove entry is in the tree
|
||||
* (cnt == 1) meaning there's been no other reader locks overlapping
|
||||
* with this one. Otherwise the remove entry will have been
|
||||
* removed from the tree and replaced by proxies (one or
|
||||
* more ranges mapping to the entire range).
|
||||
*/
|
||||
if (remove->r_cnt == 1) {
|
||||
avl_remove(tree, remove);
|
||||
if (remove->r_write_wanted) {
|
||||
cv_broadcast(&remove->r_wr_cv);
|
||||
cv_destroy(&remove->r_wr_cv);
|
||||
}
|
||||
if (remove->r_read_wanted) {
|
||||
cv_broadcast(&remove->r_rd_cv);
|
||||
cv_destroy(&remove->r_rd_cv);
|
||||
}
|
||||
} else {
|
||||
ASSERT3U(remove->r_cnt, ==, 0);
|
||||
ASSERT3U(remove->r_write_wanted, ==, 0);
|
||||
ASSERT3U(remove->r_read_wanted, ==, 0);
|
||||
/*
|
||||
* Find start proxy representing this reader lock,
|
||||
* then decrement ref count on all proxies
|
||||
* that make up this range, freeing them as needed.
|
||||
*/
|
||||
rl = avl_find(tree, remove, NULL);
|
||||
ASSERT(rl);
|
||||
ASSERT(rl->r_cnt);
|
||||
ASSERT(rl->r_type == RL_READER);
|
||||
for (len = remove->r_len; len != 0; rl = next) {
|
||||
len -= rl->r_len;
|
||||
if (len) {
|
||||
next = AVL_NEXT(tree, rl);
|
||||
ASSERT(next);
|
||||
ASSERT(rl->r_off + rl->r_len == next->r_off);
|
||||
ASSERT(next->r_cnt);
|
||||
ASSERT(next->r_type == RL_READER);
|
||||
}
|
||||
rl->r_cnt--;
|
||||
if (rl->r_cnt == 0) {
|
||||
avl_remove(tree, rl);
|
||||
if (rl->r_write_wanted) {
|
||||
cv_broadcast(&rl->r_wr_cv);
|
||||
cv_destroy(&rl->r_wr_cv);
|
||||
}
|
||||
if (rl->r_read_wanted) {
|
||||
cv_broadcast(&rl->r_rd_cv);
|
||||
cv_destroy(&rl->r_rd_cv);
|
||||
}
|
||||
kmem_free(rl, sizeof (rl_t));
|
||||
}
|
||||
}
|
||||
}
|
||||
kmem_free(remove, sizeof (rl_t));
|
||||
}
|
||||
|
||||
/*
|
||||
* Unlock range and destroy range lock structure.
|
||||
*/
|
||||
void
|
||||
zfs_range_unlock(rl_t *rl)
|
||||
{
|
||||
znode_t *zp = rl->r_zp;
|
||||
|
||||
ASSERT(rl->r_type == RL_WRITER || rl->r_type == RL_READER);
|
||||
ASSERT(rl->r_cnt == 1 || rl->r_cnt == 0);
|
||||
ASSERT(!rl->r_proxy);
|
||||
|
||||
mutex_enter(&zp->z_range_lock);
|
||||
if (rl->r_type == RL_WRITER) {
|
||||
/* writer locks can't be shared or split */
|
||||
avl_remove(&zp->z_range_avl, rl);
|
||||
mutex_exit(&zp->z_range_lock);
|
||||
if (rl->r_write_wanted) {
|
||||
cv_broadcast(&rl->r_wr_cv);
|
||||
cv_destroy(&rl->r_wr_cv);
|
||||
}
|
||||
if (rl->r_read_wanted) {
|
||||
cv_broadcast(&rl->r_rd_cv);
|
||||
cv_destroy(&rl->r_rd_cv);
|
||||
}
|
||||
kmem_free(rl, sizeof (rl_t));
|
||||
} else {
|
||||
/*
|
||||
* lock may be shared, let zfs_range_unlock_reader()
|
||||
* release the lock and free the rl_t
|
||||
*/
|
||||
zfs_range_unlock_reader(zp, rl);
|
||||
mutex_exit(&zp->z_range_lock);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Reduce range locked as RL_WRITER from whole file to specified range.
|
||||
* Asserts the whole file is exclusivly locked and so there's only one
|
||||
* entry in the tree.
|
||||
*/
|
||||
void
|
||||
zfs_range_reduce(rl_t *rl, uint64_t off, uint64_t len)
|
||||
{
|
||||
znode_t *zp = rl->r_zp;
|
||||
|
||||
/* Ensure there are no other locks */
|
||||
ASSERT(avl_numnodes(&zp->z_range_avl) == 1);
|
||||
ASSERT(rl->r_off == 0);
|
||||
ASSERT(rl->r_type == RL_WRITER);
|
||||
ASSERT(!rl->r_proxy);
|
||||
ASSERT3U(rl->r_len, ==, UINT64_MAX);
|
||||
ASSERT3U(rl->r_cnt, ==, 1);
|
||||
|
||||
mutex_enter(&zp->z_range_lock);
|
||||
rl->r_off = off;
|
||||
rl->r_len = len;
|
||||
mutex_exit(&zp->z_range_lock);
|
||||
if (rl->r_write_wanted)
|
||||
cv_broadcast(&rl->r_wr_cv);
|
||||
if (rl->r_read_wanted)
|
||||
cv_broadcast(&rl->r_rd_cv);
|
||||
}
|
||||
|
||||
/*
|
||||
* AVL comparison function used to order range locks
|
||||
* Locks are ordered on the start offset of the range.
|
||||
*/
|
||||
int
|
||||
zfs_range_compare(const void *arg1, const void *arg2)
|
||||
{
|
||||
const rl_t *rl1 = arg1;
|
||||
const rl_t *rl2 = arg2;
|
||||
|
||||
if (rl1->r_off > rl2->r_off)
|
||||
return (1);
|
||||
if (rl1->r_off < rl2->r_off)
|
||||
return (-1);
|
||||
return (0);
|
||||
}
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,34 @@
|
|||
subdir-m += include
|
||||
DISTFILES = libnvpair.c nvpair.c nvpair_alloc_fixed.c nvpair_alloc_system.c
|
||||
|
||||
MODULE := znvpair
|
||||
LIBRARY := libnvpair
|
||||
|
||||
# Compile as kernel module. Needed symlinks created for all
|
||||
# k* objects created by top level configure script.
|
||||
|
||||
EXTRA_CFLAGS = @KERNELCPPFLAGS@
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libnvpair/include
|
||||
|
||||
obj-m := ${MODULE}.o
|
||||
|
||||
${MODULE}-objs += knvpair.o # Interfaces name/value pairs
|
||||
${MODULE}-objs += nvpair_alloc_spl.o # Generic alloc/free support
|
||||
|
||||
# Compile as shared library. There's an extra useless host program
|
||||
# here called 'zu' because it was the easiest way I could convince
|
||||
# the kernel build system to construct a user space shared library.
|
||||
|
||||
HOSTCFLAGS += @HOSTCFLAGS@
|
||||
HOSTCFLAGS += -I@LIBDIR@/libsolcompat/include
|
||||
HOSTCFLAGS += -I@LIBDIR@/libport/include
|
||||
HOSTCFLAGS += -I@LIBDIR@/libnvpair/include
|
||||
|
||||
hostprogs-y := zu
|
||||
always := $(hostprogs-y)
|
||||
|
||||
zu-objs := zu.o ${LIBRARY}.so
|
||||
|
||||
${LIBRARY}-objs += unvpair.o
|
||||
${LIBRARY}-objs += nvpair_alloc_system.o
|
||||
${LIBRARY}-objs += libnvpair.o
|
|
@ -0,0 +1,2 @@
|
|||
subdir-m += sys
|
||||
DISTFILES = libnvpair.h
|
|
@ -0,0 +1,46 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2005 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _LIBNVPAIR_H
|
||||
#define _LIBNVPAIR_H
|
||||
|
||||
|
||||
|
||||
#include <sys/nvpair.h>
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
void nvlist_print(FILE *, nvlist_t *);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _LIBNVPAIR_H */
|
|
@ -0,0 +1 @@
|
|||
DISTFILES = nvpair.h nvpair_impl.h
|
|
@ -0,0 +1,262 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_NVPAIR_H
|
||||
#define _SYS_NVPAIR_H
|
||||
|
||||
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/errno.h>
|
||||
#include <sys/va_list.h>
|
||||
|
||||
#if defined(_KERNEL) && !defined(_BOOT)
|
||||
#include <sys/kmem.h>
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
typedef enum {
|
||||
DATA_TYPE_UNKNOWN = 0,
|
||||
DATA_TYPE_BOOLEAN,
|
||||
DATA_TYPE_BYTE,
|
||||
DATA_TYPE_INT16,
|
||||
DATA_TYPE_UINT16,
|
||||
DATA_TYPE_INT32,
|
||||
DATA_TYPE_UINT32,
|
||||
DATA_TYPE_INT64,
|
||||
DATA_TYPE_UINT64,
|
||||
DATA_TYPE_STRING,
|
||||
DATA_TYPE_BYTE_ARRAY,
|
||||
DATA_TYPE_INT16_ARRAY,
|
||||
DATA_TYPE_UINT16_ARRAY,
|
||||
DATA_TYPE_INT32_ARRAY,
|
||||
DATA_TYPE_UINT32_ARRAY,
|
||||
DATA_TYPE_INT64_ARRAY,
|
||||
DATA_TYPE_UINT64_ARRAY,
|
||||
DATA_TYPE_STRING_ARRAY,
|
||||
DATA_TYPE_HRTIME,
|
||||
DATA_TYPE_NVLIST,
|
||||
DATA_TYPE_NVLIST_ARRAY,
|
||||
DATA_TYPE_BOOLEAN_VALUE,
|
||||
DATA_TYPE_INT8,
|
||||
DATA_TYPE_UINT8,
|
||||
DATA_TYPE_BOOLEAN_ARRAY,
|
||||
DATA_TYPE_INT8_ARRAY,
|
||||
DATA_TYPE_UINT8_ARRAY
|
||||
} data_type_t;
|
||||
|
||||
typedef struct nvpair {
|
||||
int32_t nvp_size; /* size of this nvpair */
|
||||
int16_t nvp_name_sz; /* length of name string */
|
||||
int16_t nvp_reserve; /* not used */
|
||||
int32_t nvp_value_elem; /* number of elements for array types */
|
||||
data_type_t nvp_type; /* type of value */
|
||||
/* name string */
|
||||
/* aligned ptr array for string arrays */
|
||||
/* aligned array of data for value */
|
||||
} nvpair_t;
|
||||
|
||||
/* nvlist header */
|
||||
typedef struct nvlist {
|
||||
int32_t nvl_version;
|
||||
uint32_t nvl_nvflag; /* persistent flags */
|
||||
uint64_t nvl_priv; /* ptr to private data if not packed */
|
||||
uint32_t nvl_flag;
|
||||
int32_t nvl_pad; /* currently not used, for alignment */
|
||||
} nvlist_t;
|
||||
|
||||
/* nvp implementation version */
|
||||
#define NV_VERSION 0
|
||||
|
||||
/* nvlist pack encoding */
|
||||
#define NV_ENCODE_NATIVE 0
|
||||
#define NV_ENCODE_XDR 1
|
||||
|
||||
/* nvlist persistent unique name flags, stored in nvl_nvflags */
|
||||
#define NV_UNIQUE_NAME 0x1
|
||||
#define NV_UNIQUE_NAME_TYPE 0x2
|
||||
|
||||
/* nvlist lookup pairs related flags */
|
||||
#define NV_FLAG_NOENTOK 0x1
|
||||
|
||||
/* convenience macros */
|
||||
#define NV_ALIGN(x) (((ulong_t)(x) + 7ul) & ~7ul)
|
||||
#define NV_ALIGN4(x) (((x) + 3) & ~3)
|
||||
|
||||
#define NVP_SIZE(nvp) ((nvp)->nvp_size)
|
||||
#define NVP_NAME(nvp) ((char *)(nvp) + sizeof (nvpair_t))
|
||||
#define NVP_TYPE(nvp) ((nvp)->nvp_type)
|
||||
#define NVP_NELEM(nvp) ((nvp)->nvp_value_elem)
|
||||
#define NVP_VALUE(nvp) ((char *)(nvp) + NV_ALIGN(sizeof (nvpair_t) \
|
||||
+ (nvp)->nvp_name_sz))
|
||||
|
||||
#define NVL_VERSION(nvl) ((nvl)->nvl_version)
|
||||
#define NVL_SIZE(nvl) ((nvl)->nvl_size)
|
||||
#define NVL_FLAG(nvl) ((nvl)->nvl_flag)
|
||||
|
||||
/* NV allocator framework */
|
||||
typedef struct nv_alloc_ops nv_alloc_ops_t;
|
||||
|
||||
typedef struct nv_alloc {
|
||||
const nv_alloc_ops_t *nva_ops;
|
||||
void *nva_arg;
|
||||
} nv_alloc_t;
|
||||
|
||||
struct nv_alloc_ops {
|
||||
int (*nv_ao_init)(nv_alloc_t *, __va_list);
|
||||
void (*nv_ao_fini)(nv_alloc_t *);
|
||||
void *(*nv_ao_alloc)(nv_alloc_t *, size_t);
|
||||
void (*nv_ao_free)(nv_alloc_t *, void *, size_t);
|
||||
void (*nv_ao_reset)(nv_alloc_t *);
|
||||
};
|
||||
|
||||
extern const nv_alloc_ops_t *nv_fixed_ops;
|
||||
extern nv_alloc_t *nv_alloc_nosleep;
|
||||
|
||||
#if defined(_KERNEL) && !defined(_BOOT)
|
||||
extern nv_alloc_t *nv_alloc_sleep;
|
||||
#endif
|
||||
|
||||
int nv_alloc_init(nv_alloc_t *, const nv_alloc_ops_t *, /* args */ ...);
|
||||
void nv_alloc_reset(nv_alloc_t *);
|
||||
void nv_alloc_fini(nv_alloc_t *);
|
||||
|
||||
/* list management */
|
||||
int nvlist_alloc(nvlist_t **, uint_t, int);
|
||||
void nvlist_free(nvlist_t *);
|
||||
int nvlist_size(nvlist_t *, size_t *, int);
|
||||
int nvlist_pack(nvlist_t *, char **, size_t *, int, int);
|
||||
int nvlist_unpack(char *, size_t, nvlist_t **, int);
|
||||
int nvlist_dup(nvlist_t *, nvlist_t **, int);
|
||||
int nvlist_merge(nvlist_t *, nvlist_t *, int);
|
||||
|
||||
int nvlist_xalloc(nvlist_t **, uint_t, nv_alloc_t *);
|
||||
int nvlist_xpack(nvlist_t *, char **, size_t *, int, nv_alloc_t *);
|
||||
int nvlist_xunpack(char *, size_t, nvlist_t **, nv_alloc_t *);
|
||||
int nvlist_xdup(nvlist_t *, nvlist_t **, nv_alloc_t *);
|
||||
nv_alloc_t *nvlist_lookup_nv_alloc(nvlist_t *);
|
||||
|
||||
int nvlist_add_nvpair(nvlist_t *, nvpair_t *);
|
||||
int nvlist_add_boolean(nvlist_t *, const char *);
|
||||
int nvlist_add_boolean_value(nvlist_t *, const char *, boolean_t);
|
||||
int nvlist_add_byte(nvlist_t *, const char *, uchar_t);
|
||||
int nvlist_add_int8(nvlist_t *, const char *, int8_t);
|
||||
int nvlist_add_uint8(nvlist_t *, const char *, uint8_t);
|
||||
int nvlist_add_int16(nvlist_t *, const char *, int16_t);
|
||||
int nvlist_add_uint16(nvlist_t *, const char *, uint16_t);
|
||||
int nvlist_add_int32(nvlist_t *, const char *, int32_t);
|
||||
int nvlist_add_uint32(nvlist_t *, const char *, uint32_t);
|
||||
int nvlist_add_int64(nvlist_t *, const char *, int64_t);
|
||||
int nvlist_add_uint64(nvlist_t *, const char *, uint64_t);
|
||||
int nvlist_add_string(nvlist_t *, const char *, const char *);
|
||||
int nvlist_add_nvlist(nvlist_t *, const char *, nvlist_t *);
|
||||
int nvlist_add_boolean_array(nvlist_t *, const char *, boolean_t *, uint_t);
|
||||
int nvlist_add_byte_array(nvlist_t *, const char *, uchar_t *, uint_t);
|
||||
int nvlist_add_int8_array(nvlist_t *, const char *, int8_t *, uint_t);
|
||||
int nvlist_add_uint8_array(nvlist_t *, const char *, uint8_t *, uint_t);
|
||||
int nvlist_add_int16_array(nvlist_t *, const char *, int16_t *, uint_t);
|
||||
int nvlist_add_uint16_array(nvlist_t *, const char *, uint16_t *, uint_t);
|
||||
int nvlist_add_int32_array(nvlist_t *, const char *, int32_t *, uint_t);
|
||||
int nvlist_add_uint32_array(nvlist_t *, const char *, uint32_t *, uint_t);
|
||||
int nvlist_add_int64_array(nvlist_t *, const char *, int64_t *, uint_t);
|
||||
int nvlist_add_uint64_array(nvlist_t *, const char *, uint64_t *, uint_t);
|
||||
int nvlist_add_string_array(nvlist_t *, const char *, char *const *, uint_t);
|
||||
int nvlist_add_nvlist_array(nvlist_t *, const char *, nvlist_t **, uint_t);
|
||||
int nvlist_add_hrtime(nvlist_t *, const char *, hrtime_t);
|
||||
|
||||
int nvlist_remove(nvlist_t *, const char *, data_type_t);
|
||||
int nvlist_remove_all(nvlist_t *, const char *);
|
||||
|
||||
int nvlist_lookup_boolean(nvlist_t *, const char *);
|
||||
int nvlist_lookup_boolean_value(nvlist_t *, const char *, boolean_t *);
|
||||
int nvlist_lookup_byte(nvlist_t *, const char *, uchar_t *);
|
||||
int nvlist_lookup_int8(nvlist_t *, const char *, int8_t *);
|
||||
int nvlist_lookup_uint8(nvlist_t *, const char *, uint8_t *);
|
||||
int nvlist_lookup_int16(nvlist_t *, const char *, int16_t *);
|
||||
int nvlist_lookup_uint16(nvlist_t *, const char *, uint16_t *);
|
||||
int nvlist_lookup_int32(nvlist_t *, const char *, int32_t *);
|
||||
int nvlist_lookup_uint32(nvlist_t *, const char *, uint32_t *);
|
||||
int nvlist_lookup_int64(nvlist_t *, const char *, int64_t *);
|
||||
int nvlist_lookup_uint64(nvlist_t *, const char *, uint64_t *);
|
||||
int nvlist_lookup_string(nvlist_t *, const char *, char **);
|
||||
int nvlist_lookup_nvlist(nvlist_t *, const char *, nvlist_t **);
|
||||
int nvlist_lookup_boolean_array(nvlist_t *, const char *,
|
||||
boolean_t **, uint_t *);
|
||||
int nvlist_lookup_byte_array(nvlist_t *, const char *, uchar_t **, uint_t *);
|
||||
int nvlist_lookup_int8_array(nvlist_t *, const char *, int8_t **, uint_t *);
|
||||
int nvlist_lookup_uint8_array(nvlist_t *, const char *, uint8_t **, uint_t *);
|
||||
int nvlist_lookup_int16_array(nvlist_t *, const char *, int16_t **, uint_t *);
|
||||
int nvlist_lookup_uint16_array(nvlist_t *, const char *, uint16_t **, uint_t *);
|
||||
int nvlist_lookup_int32_array(nvlist_t *, const char *, int32_t **, uint_t *);
|
||||
int nvlist_lookup_uint32_array(nvlist_t *, const char *, uint32_t **, uint_t *);
|
||||
int nvlist_lookup_int64_array(nvlist_t *, const char *, int64_t **, uint_t *);
|
||||
int nvlist_lookup_uint64_array(nvlist_t *, const char *, uint64_t **, uint_t *);
|
||||
int nvlist_lookup_string_array(nvlist_t *, const char *, char ***, uint_t *);
|
||||
int nvlist_lookup_nvlist_array(nvlist_t *, const char *,
|
||||
nvlist_t ***, uint_t *);
|
||||
int nvlist_lookup_hrtime(nvlist_t *, const char *, hrtime_t *);
|
||||
int nvlist_lookup_pairs(nvlist_t *nvl, int, ...);
|
||||
|
||||
int nvlist_lookup_nvpair(nvlist_t *nvl, const char *, nvpair_t **);
|
||||
boolean_t nvlist_exists(nvlist_t *nvl, const char *);
|
||||
|
||||
/* processing nvpair */
|
||||
nvpair_t *nvlist_next_nvpair(nvlist_t *nvl, nvpair_t *);
|
||||
char *nvpair_name(nvpair_t *);
|
||||
data_type_t nvpair_type(nvpair_t *);
|
||||
int nvpair_value_boolean_value(nvpair_t *, boolean_t *);
|
||||
int nvpair_value_byte(nvpair_t *, uchar_t *);
|
||||
int nvpair_value_int8(nvpair_t *, int8_t *);
|
||||
int nvpair_value_uint8(nvpair_t *, uint8_t *);
|
||||
int nvpair_value_int16(nvpair_t *, int16_t *);
|
||||
int nvpair_value_uint16(nvpair_t *, uint16_t *);
|
||||
int nvpair_value_int32(nvpair_t *, int32_t *);
|
||||
int nvpair_value_uint32(nvpair_t *, uint32_t *);
|
||||
int nvpair_value_int64(nvpair_t *, int64_t *);
|
||||
int nvpair_value_uint64(nvpair_t *, uint64_t *);
|
||||
int nvpair_value_string(nvpair_t *, char **);
|
||||
int nvpair_value_nvlist(nvpair_t *, nvlist_t **);
|
||||
int nvpair_value_boolean_array(nvpair_t *, boolean_t **, uint_t *);
|
||||
int nvpair_value_byte_array(nvpair_t *, uchar_t **, uint_t *);
|
||||
int nvpair_value_int8_array(nvpair_t *, int8_t **, uint_t *);
|
||||
int nvpair_value_uint8_array(nvpair_t *, uint8_t **, uint_t *);
|
||||
int nvpair_value_int16_array(nvpair_t *, int16_t **, uint_t *);
|
||||
int nvpair_value_uint16_array(nvpair_t *, uint16_t **, uint_t *);
|
||||
int nvpair_value_int32_array(nvpair_t *, int32_t **, uint_t *);
|
||||
int nvpair_value_uint32_array(nvpair_t *, uint32_t **, uint_t *);
|
||||
int nvpair_value_int64_array(nvpair_t *, int64_t **, uint_t *);
|
||||
int nvpair_value_uint64_array(nvpair_t *, uint64_t **, uint_t *);
|
||||
int nvpair_value_string_array(nvpair_t *, char ***, uint_t *);
|
||||
int nvpair_value_nvlist_array(nvpair_t *, nvlist_t ***, uint_t *);
|
||||
int nvpair_value_hrtime(nvpair_t *, hrtime_t *);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _SYS_NVPAIR_H */
|
|
@ -0,0 +1,73 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _NVPAIR_IMPL_H
|
||||
#define _NVPAIR_IMPL_H
|
||||
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#include <sys/nvpair.h>
|
||||
|
||||
/*
|
||||
* The structures here provided for information and debugging purposes only
|
||||
* may be changed in the future.
|
||||
*/
|
||||
|
||||
/*
|
||||
* implementation linked list for pre-packed data
|
||||
*/
|
||||
typedef struct i_nvp i_nvp_t;
|
||||
|
||||
struct i_nvp {
|
||||
union {
|
||||
uint64_t _nvi_align; /* ensure alignment */
|
||||
struct {
|
||||
i_nvp_t *_nvi_next; /* pointer to next nvpair */
|
||||
i_nvp_t *_nvi_prev; /* pointer to prev nvpair */
|
||||
} _nvi;
|
||||
} _nvi_un;
|
||||
nvpair_t nvi_nvp; /* nvpair */
|
||||
};
|
||||
#define nvi_next _nvi_un._nvi._nvi_next
|
||||
#define nvi_prev _nvi_un._nvi._nvi_prev
|
||||
|
||||
typedef struct {
|
||||
i_nvp_t *nvp_list; /* linked list of nvpairs */
|
||||
i_nvp_t *nvp_last; /* last nvpair */
|
||||
i_nvp_t *nvp_curr; /* current walker nvpair */
|
||||
nv_alloc_t *nvp_nva; /* pluggable allocator */
|
||||
uint32_t nvp_stat; /* internal state */
|
||||
} nvpriv_t;
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _NVPAIR_IMPL_H */
|
|
@ -0,0 +1,266 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
|
||||
|
||||
#include <unistd.h>
|
||||
#include <strings.h>
|
||||
#include "libnvpair.h"
|
||||
|
||||
/*
|
||||
* libnvpair - A tools library for manipulating <name, value> pairs.
|
||||
*
|
||||
* This library provides routines packing an unpacking nv pairs
|
||||
* for transporting data across process boundaries, transporting
|
||||
* between kernel and userland, and possibly saving onto disk files.
|
||||
*/
|
||||
|
||||
static void
|
||||
indent(FILE *fp, int depth)
|
||||
{
|
||||
while (depth-- > 0)
|
||||
(void) fprintf(fp, "\t");
|
||||
}
|
||||
|
||||
/*
|
||||
* nvlist_print - Prints elements in an event buffer
|
||||
*/
|
||||
static
|
||||
void
|
||||
nvlist_print_with_indent(FILE *fp, nvlist_t *nvl, int depth)
|
||||
{
|
||||
int i;
|
||||
char *name;
|
||||
uint_t nelem;
|
||||
nvpair_t *nvp;
|
||||
|
||||
if (nvl == NULL)
|
||||
return;
|
||||
|
||||
indent(fp, depth);
|
||||
(void) fprintf(fp, "nvlist version: %d\n", NVL_VERSION(nvl));
|
||||
|
||||
nvp = nvlist_next_nvpair(nvl, NULL);
|
||||
|
||||
while (nvp) {
|
||||
data_type_t type = nvpair_type(nvp);
|
||||
|
||||
indent(fp, depth);
|
||||
name = nvpair_name(nvp);
|
||||
(void) fprintf(fp, "\t%s =", name);
|
||||
nelem = 0;
|
||||
switch (type) {
|
||||
case DATA_TYPE_BOOLEAN: {
|
||||
(void) fprintf(fp, " 1");
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_BOOLEAN_VALUE: {
|
||||
boolean_t val;
|
||||
(void) nvpair_value_boolean_value(nvp, &val);
|
||||
(void) fprintf(fp, " %d", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_BYTE: {
|
||||
uchar_t val;
|
||||
(void) nvpair_value_byte(nvp, &val);
|
||||
(void) fprintf(fp, " 0x%2.2x", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_INT8: {
|
||||
int8_t val;
|
||||
(void) nvpair_value_int8(nvp, &val);
|
||||
(void) fprintf(fp, " %d", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_UINT8: {
|
||||
uint8_t val;
|
||||
(void) nvpair_value_uint8(nvp, &val);
|
||||
(void) fprintf(fp, " 0x%x", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_INT16: {
|
||||
int16_t val;
|
||||
(void) nvpair_value_int16(nvp, &val);
|
||||
(void) fprintf(fp, " %d", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_UINT16: {
|
||||
uint16_t val;
|
||||
(void) nvpair_value_uint16(nvp, &val);
|
||||
(void) fprintf(fp, " 0x%x", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_INT32: {
|
||||
int32_t val;
|
||||
(void) nvpair_value_int32(nvp, &val);
|
||||
(void) fprintf(fp, " %d", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_UINT32: {
|
||||
uint32_t val;
|
||||
(void) nvpair_value_uint32(nvp, &val);
|
||||
(void) fprintf(fp, " 0x%x", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_INT64: {
|
||||
int64_t val;
|
||||
(void) nvpair_value_int64(nvp, &val);
|
||||
(void) fprintf(fp, " %lld", (longlong_t)val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_UINT64: {
|
||||
uint64_t val;
|
||||
(void) nvpair_value_uint64(nvp, &val);
|
||||
(void) fprintf(fp, " 0x%llx", (u_longlong_t)val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_STRING: {
|
||||
char *val;
|
||||
(void) nvpair_value_string(nvp, &val);
|
||||
(void) fprintf(fp, " %s", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_BOOLEAN_ARRAY: {
|
||||
boolean_t *val;
|
||||
(void) nvpair_value_boolean_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " %d", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_BYTE_ARRAY: {
|
||||
uchar_t *val;
|
||||
(void) nvpair_value_byte_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " 0x%2.2x", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_INT8_ARRAY: {
|
||||
int8_t *val;
|
||||
(void) nvpair_value_int8_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " %d", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_UINT8_ARRAY: {
|
||||
uint8_t *val;
|
||||
(void) nvpair_value_uint8_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " 0x%x", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_INT16_ARRAY: {
|
||||
int16_t *val;
|
||||
(void) nvpair_value_int16_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " %d", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_UINT16_ARRAY: {
|
||||
uint16_t *val;
|
||||
(void) nvpair_value_uint16_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " 0x%x", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_INT32_ARRAY: {
|
||||
int32_t *val;
|
||||
(void) nvpair_value_int32_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " %d", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_UINT32_ARRAY: {
|
||||
uint32_t *val;
|
||||
(void) nvpair_value_uint32_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " 0x%x", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_INT64_ARRAY: {
|
||||
int64_t *val;
|
||||
(void) nvpair_value_int64_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " %lld", (longlong_t)val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_UINT64_ARRAY: {
|
||||
uint64_t *val;
|
||||
(void) nvpair_value_uint64_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " 0x%llx",
|
||||
(u_longlong_t)val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_STRING_ARRAY: {
|
||||
char **val;
|
||||
(void) nvpair_value_string_array(nvp, &val, &nelem);
|
||||
for (i = 0; i < nelem; i++)
|
||||
(void) fprintf(fp, " %s", val[i]);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_HRTIME: {
|
||||
hrtime_t val;
|
||||
(void) nvpair_value_hrtime(nvp, &val);
|
||||
(void) fprintf(fp, " 0x%llx", val);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_NVLIST: {
|
||||
nvlist_t *val;
|
||||
(void) nvpair_value_nvlist(nvp, &val);
|
||||
(void) fprintf(fp, " (embedded nvlist)\n");
|
||||
nvlist_print_with_indent(fp, val, depth + 1);
|
||||
indent(fp, depth + 1);
|
||||
(void) fprintf(fp, "(end %s)\n", name);
|
||||
break;
|
||||
}
|
||||
case DATA_TYPE_NVLIST_ARRAY: {
|
||||
nvlist_t **val;
|
||||
(void) nvpair_value_nvlist_array(nvp, &val, &nelem);
|
||||
(void) fprintf(fp, " (array of embedded nvlists)\n");
|
||||
for (i = 0; i < nelem; i++) {
|
||||
indent(fp, depth + 1);
|
||||
(void) fprintf(fp,
|
||||
"(start %s[%d])\n", name, i);
|
||||
nvlist_print_with_indent(fp, val[i], depth + 1);
|
||||
indent(fp, depth + 1);
|
||||
(void) fprintf(fp, "(end %s[%d])\n", name, i);
|
||||
}
|
||||
break;
|
||||
}
|
||||
default:
|
||||
(void) fprintf(fp, " unknown data type (%d)", type);
|
||||
break;
|
||||
}
|
||||
(void) fprintf(fp, "\n");
|
||||
nvp = nvlist_next_nvpair(nvl, nvp);
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
nvlist_print(FILE *fp, nvlist_t *nvl)
|
||||
{
|
||||
nvlist_print_with_indent(fp, nvl, 0);
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,120 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright 2006 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
|
||||
|
||||
#include <sys/stropts.h>
|
||||
#include <sys/isa_defs.h>
|
||||
#include <sys/nvpair.h>
|
||||
#include <sys/sysmacros.h>
|
||||
#if defined(_KERNEL) && !defined(_BOOT)
|
||||
#include <sys/varargs.h>
|
||||
#else
|
||||
#include <stdarg.h>
|
||||
#include <strings.h>
|
||||
#endif
|
||||
|
||||
/*
|
||||
* This allocator is very simple.
|
||||
* - it uses a pre-allocated buffer for memory allocations.
|
||||
* - it does _not_ free memory in the pre-allocated buffer.
|
||||
*
|
||||
* The reason for the selected implemention is simplicity.
|
||||
* This allocator is designed for the usage in interrupt context when
|
||||
* the caller may not wait for free memory.
|
||||
*/
|
||||
|
||||
/* pre-allocated buffer for memory allocations */
|
||||
typedef struct nvbuf {
|
||||
uintptr_t nvb_buf; /* address of pre-allocated buffer */
|
||||
uintptr_t nvb_lim; /* limit address in the buffer */
|
||||
uintptr_t nvb_cur; /* current address in the buffer */
|
||||
} nvbuf_t;
|
||||
|
||||
/*
|
||||
* Initialize the pre-allocated buffer allocator. The caller needs to supply
|
||||
*
|
||||
* buf address of pre-allocated buffer
|
||||
* bufsz size of pre-allocated buffer
|
||||
*
|
||||
* nv_fixed_init() calculates the remaining members of nvbuf_t.
|
||||
*/
|
||||
static int
|
||||
nv_fixed_init(nv_alloc_t *nva, va_list valist)
|
||||
{
|
||||
uintptr_t base = va_arg(valist, uintptr_t);
|
||||
uintptr_t lim = base + va_arg(valist, size_t);
|
||||
nvbuf_t *nvb = (nvbuf_t *)P2ROUNDUP(base, sizeof (uintptr_t));
|
||||
|
||||
if (base == 0 || (uintptr_t)&nvb[1] > lim)
|
||||
return (EINVAL);
|
||||
|
||||
nvb->nvb_buf = (uintptr_t)&nvb[0];
|
||||
nvb->nvb_cur = (uintptr_t)&nvb[1];
|
||||
nvb->nvb_lim = lim;
|
||||
nva->nva_arg = nvb;
|
||||
|
||||
return (0);
|
||||
}
|
||||
|
||||
static void *
|
||||
nv_fixed_alloc(nv_alloc_t *nva, size_t size)
|
||||
{
|
||||
nvbuf_t *nvb = nva->nva_arg;
|
||||
uintptr_t new = nvb->nvb_cur;
|
||||
|
||||
if (size == 0 || new + size > nvb->nvb_lim)
|
||||
return (NULL);
|
||||
|
||||
nvb->nvb_cur = P2ROUNDUP(new + size, sizeof (uintptr_t));
|
||||
|
||||
return ((void *)new);
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
static void
|
||||
nv_fixed_free(nv_alloc_t *nva, void *buf, size_t size)
|
||||
{
|
||||
/* don't free memory in the pre-allocated buffer */
|
||||
}
|
||||
|
||||
static void
|
||||
nv_fixed_reset(nv_alloc_t *nva)
|
||||
{
|
||||
nvbuf_t *nvb = nva->nva_arg;
|
||||
|
||||
nvb->nvb_cur = (uintptr_t)&nvb[1];
|
||||
}
|
||||
|
||||
const nv_alloc_ops_t nv_fixed_ops_def = {
|
||||
nv_fixed_init, /* nv_ao_init() */
|
||||
NULL, /* nv_ao_fini() */
|
||||
nv_fixed_alloc, /* nv_ao_alloc() */
|
||||
nv_fixed_free, /* nv_ao_free() */
|
||||
nv_fixed_reset /* nv_ao_reset() */
|
||||
};
|
||||
|
||||
const nv_alloc_ops_t *nv_fixed_ops = &nv_fixed_ops_def;
|
|
@ -0,0 +1,59 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
|
||||
|
||||
#include <sys/nvpair.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
/*ARGSUSED*/
|
||||
static void *
|
||||
nv_alloc_sys(nv_alloc_t *nva, size_t size)
|
||||
{
|
||||
return (malloc(size));
|
||||
}
|
||||
|
||||
/*ARGSUSED*/
|
||||
static void
|
||||
nv_free_sys(nv_alloc_t *nva, void *buf, size_t size)
|
||||
{
|
||||
free(buf);
|
||||
}
|
||||
|
||||
const nv_alloc_ops_t system_ops_def = {
|
||||
NULL, /* nv_ao_init() */
|
||||
NULL, /* nv_ao_fini() */
|
||||
nv_alloc_sys, /* nv_ao_alloc() */
|
||||
nv_free_sys, /* nv_ao_free() */
|
||||
NULL /* nv_ao_reset() */
|
||||
};
|
||||
|
||||
nv_alloc_t nv_alloc_nosleep_def = {
|
||||
&system_ops_def,
|
||||
NULL
|
||||
};
|
||||
|
||||
nv_alloc_t *nv_alloc_nosleep = &nv_alloc_nosleep_def;
|
|
@ -0,0 +1,37 @@
|
|||
subdir-m += include
|
||||
DISTFILES = port.c strlcat.c strlcpy.c strnlen.c u8_textprep.c
|
||||
|
||||
MODULE := zport
|
||||
LIBRARY := libzport
|
||||
|
||||
# Compile as kernel module. Needed symlinks created for all
|
||||
# k* objects created by top level configure script.
|
||||
|
||||
EXTRA_CFLAGS = @KERNELCPPFLAGS@
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libzcommon/include
|
||||
EXTRA_CFLAGS += -I@LIBDIR@/libport/include
|
||||
|
||||
obj-m := ${MODULE}.o
|
||||
|
||||
${MODULE}-objs += spl.o
|
||||
${MODULE}-objs += ku8_textprep.o
|
||||
|
||||
# Compile as shared library. There's an extra useless host program
|
||||
# here called 'zu' because it was the easiest way I could convince
|
||||
# the kernel build system to construct a user space shared library.
|
||||
|
||||
HOSTCFLAGS += @HOSTCFLAGS@
|
||||
HOSTCFLAGS += -I@LIBDIR@/libzcommon/include
|
||||
HOSTCFLAGS += -I@LIBDIR@/libport/include
|
||||
|
||||
hostprogs-y := zu
|
||||
always := $(hostprogs-y)
|
||||
|
||||
zu-objs := zu.o ${LIBRARY}.so
|
||||
|
||||
${LIBRARY}-objs += strlcpy.o
|
||||
${LIBRARY}-objs += strlcat.o
|
||||
${LIBRARY}-objs += strnlen.o
|
||||
${LIBRARY}-objs += port.o
|
||||
${LIBRARY}-objs += u8_textprep.o
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
subdir-m += sys
|
||||
|
||||
DISTFILES = fake_ioctl.h libdiskmgt.h libshare.h mntent.h stdlib.h
|
||||
DISTFILES += string.h strings.h stropts.h unistd.h
|
|
@ -0,0 +1,41 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#ifndef _PORT_FAKE_IOCTL_H
|
||||
#define _PORT_FAKE_IOCTL_H
|
||||
|
||||
static inline int real_ioctl(int fd, int request, void *arg)
|
||||
{
|
||||
return ioctl(fd, request, arg);
|
||||
}
|
||||
|
||||
#ifdef WANT_FAKE_IOCTL
|
||||
|
||||
#include <sys/dmu_ctl.h>
|
||||
#define ioctl(fd,req,arg) dctlc_ioctl(fd,req,arg)
|
||||
|
||||
#endif
|
||||
|
||||
#endif /* _PORT_FAKE_IOCTL_H */
|
|
@ -0,0 +1,278 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include "zfs_config.h"
|
||||
|
||||
#ifdef HAVE_LIBDISKMGT_H
|
||||
#include_next <libdiskmgt.h>
|
||||
#else
|
||||
|
||||
#ifndef _LIBDISKMGT_H
|
||||
#define _LIBDISKMGT_H
|
||||
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#include <libnvpair.h>
|
||||
#include <sys/swap.h>
|
||||
|
||||
|
||||
/*
|
||||
* Holds all the data regarding the device.
|
||||
* Private to libdiskmgt. Must use dm_xxx functions to set/get data.
|
||||
*/
|
||||
typedef uint64_t dm_descriptor_t;
|
||||
|
||||
typedef enum {
|
||||
DM_WHO_MKFS = 0,
|
||||
DM_WHO_ZPOOL,
|
||||
DM_WHO_ZPOOL_FORCE,
|
||||
DM_WHO_FORMAT,
|
||||
DM_WHO_SWAP,
|
||||
DM_WHO_DUMP,
|
||||
DM_WHO_ZPOOL_SPARE
|
||||
} dm_who_type_t;
|
||||
|
||||
typedef enum {
|
||||
DM_DRIVE = 0,
|
||||
DM_CONTROLLER,
|
||||
DM_MEDIA,
|
||||
DM_SLICE,
|
||||
DM_PARTITION,
|
||||
DM_PATH,
|
||||
DM_ALIAS,
|
||||
DM_BUS
|
||||
} dm_desc_type_t;
|
||||
|
||||
|
||||
typedef enum {
|
||||
DM_DT_UNKNOWN = 0,
|
||||
DM_DT_FIXED,
|
||||
DM_DT_ZIP,
|
||||
DM_DT_JAZ,
|
||||
DM_DT_FLOPPY,
|
||||
DM_DT_MO_ERASABLE,
|
||||
DM_DT_MO_WRITEONCE,
|
||||
DM_DT_AS_MO,
|
||||
DM_DT_CDROM,
|
||||
DM_DT_CDR,
|
||||
DM_DT_CDRW,
|
||||
DM_DT_DVDROM,
|
||||
DM_DT_DVDR,
|
||||
DM_DT_DVDRAM,
|
||||
DM_DT_DVDRW,
|
||||
DM_DT_DDCDROM,
|
||||
DM_DT_DDCDR,
|
||||
DM_DT_DDCDRW
|
||||
} dm_drive_type_t;
|
||||
|
||||
typedef enum {
|
||||
DM_MT_UNKNOWN = 0,
|
||||
DM_MT_FIXED,
|
||||
DM_MT_FLOPPY,
|
||||
DM_MT_CDROM,
|
||||
DM_MT_ZIP,
|
||||
DM_MT_JAZ,
|
||||
DM_MT_CDR,
|
||||
DM_MT_CDRW,
|
||||
DM_MT_DVDROM,
|
||||
DM_MT_DVDR,
|
||||
DM_MT_DVDRAM,
|
||||
DM_MT_MO_ERASABLE,
|
||||
DM_MT_MO_WRITEONCE,
|
||||
DM_MT_AS_MO
|
||||
} dm_media_type_t;
|
||||
|
||||
#define DM_FILTER_END -1
|
||||
|
||||
/* drive stat name */
|
||||
typedef enum {
|
||||
DM_DRV_STAT_PERFORMANCE = 0,
|
||||
DM_DRV_STAT_DIAGNOSTIC,
|
||||
DM_DRV_STAT_TEMPERATURE
|
||||
} dm_drive_stat_t;
|
||||
|
||||
/* slice stat name */
|
||||
typedef enum {
|
||||
DM_SLICE_STAT_USE = 0
|
||||
} dm_slice_stat_t;
|
||||
|
||||
/* attribute definitions */
|
||||
|
||||
/* drive */
|
||||
#define DM_DISK_UP 1
|
||||
#define DM_DISK_DOWN 0
|
||||
|
||||
#define DM_CLUSTERED "clustered"
|
||||
#define DM_DRVTYPE "drvtype"
|
||||
#define DM_FAILING "failing"
|
||||
#define DM_LOADED "loaded" /* also in media */
|
||||
#define DM_NDNRERRS "ndevice_not_ready_errors"
|
||||
#define DM_NBYTESREAD "nbytes_read"
|
||||
#define DM_NBYTESWRITTEN "nbytes_written"
|
||||
#define DM_NHARDERRS "nhard_errors"
|
||||
#define DM_NILLREQERRS "nillegal_req_errors"
|
||||
#define DM_NMEDIAERRS "nmedia_errors"
|
||||
#define DM_NNODEVERRS "nno_dev_errors"
|
||||
#define DM_NREADOPS "nread_ops"
|
||||
#define DM_NRECOVERRS "nrecoverable_errors"
|
||||
#define DM_NSOFTERRS "nsoft_errors"
|
||||
#define DM_NTRANSERRS "ntransport_errors"
|
||||
#define DM_NWRITEOPS "nwrite_ops"
|
||||
#define DM_OPATH "opath"
|
||||
#define DM_PRODUCT_ID "product_id"
|
||||
#define DM_REMOVABLE "removable" /* also in media */
|
||||
#define DM_RPM "rpm"
|
||||
#define DM_STATUS "status"
|
||||
#define DM_SYNC_SPEED "sync_speed"
|
||||
#define DM_TEMPERATURE "temperature"
|
||||
#define DM_VENDOR_ID "vendor_id"
|
||||
#define DM_WIDE "wide" /* also on controller */
|
||||
#define DM_WWN "wwn"
|
||||
|
||||
/* bus */
|
||||
#define DM_BTYPE "btype"
|
||||
#define DM_CLOCK "clock" /* also on controller */
|
||||
#define DM_PNAME "pname"
|
||||
|
||||
/* controller */
|
||||
#define DM_FAST "fast"
|
||||
#define DM_FAST20 "fast20"
|
||||
#define DM_FAST40 "fast40"
|
||||
#define DM_FAST80 "fast80"
|
||||
#define DM_MULTIPLEX "multiplex"
|
||||
#define DM_PATH_STATE "path_state"
|
||||
|
||||
#define DM_CTYPE_ATA "ata"
|
||||
#define DM_CTYPE_SCSI "scsi"
|
||||
#define DM_CTYPE_FIBRE "fibre channel"
|
||||
#define DM_CTYPE_USB "usb"
|
||||
#define DM_CTYPE_UNKNOWN "unknown"
|
||||
|
||||
/* media */
|
||||
#define DM_BLOCKSIZE "blocksize"
|
||||
#define DM_FDISK "fdisk"
|
||||
#define DM_MTYPE "mtype"
|
||||
#define DM_NACTUALCYLINDERS "nactual_cylinders"
|
||||
#define DM_NALTCYLINDERS "nalt_cylinders"
|
||||
#define DM_NCYLINDERS "ncylinders"
|
||||
#define DM_NHEADS "nheads"
|
||||
#define DM_NPHYSCYLINDERS "nphys_cylinders"
|
||||
#define DM_NSECTORS "nsectors" /* also in partition */
|
||||
#define DM_SIZE "size" /* also in slice */
|
||||
#define DM_NACCESSIBLE "naccessible"
|
||||
#define DM_LABEL "label"
|
||||
|
||||
/* partition */
|
||||
#define DM_BCYL "bcyl"
|
||||
#define DM_BHEAD "bhead"
|
||||
#define DM_BOOTID "bootid"
|
||||
#define DM_BSECT "bsect"
|
||||
#define DM_ECYL "ecyl"
|
||||
#define DM_EHEAD "ehead"
|
||||
#define DM_ESECT "esect"
|
||||
#define DM_PTYPE "ptype"
|
||||
#define DM_RELSECT "relsect"
|
||||
|
||||
/* slice */
|
||||
#define DM_DEVICEID "deviceid"
|
||||
#define DM_DEVT "devt"
|
||||
#define DM_INDEX "index"
|
||||
#define DM_EFI_NAME "name"
|
||||
#define DM_MOUNTPOINT "mountpoint"
|
||||
#define DM_LOCALNAME "localname"
|
||||
#define DM_START "start"
|
||||
#define DM_TAG "tag"
|
||||
#define DM_FLAG "flag"
|
||||
#define DM_EFI "efi" /* also on media */
|
||||
#define DM_USED_BY "used_by"
|
||||
#define DM_USED_NAME "used_name"
|
||||
#define DM_USE_MOUNT "mount"
|
||||
#define DM_USE_SVM "svm"
|
||||
#define DM_USE_LU "lu"
|
||||
#define DM_USE_DUMP "dump"
|
||||
#define DM_USE_VXVM "vxvm"
|
||||
#define DM_USE_FS "fs"
|
||||
#define DM_USE_VFSTAB "vfstab"
|
||||
#define DM_USE_EXPORTED_ZPOOL "exported_zpool"
|
||||
#define DM_USE_ACTIVE_ZPOOL "active_zpool"
|
||||
#define DM_USE_SPARE_ZPOOL "spare_zpool"
|
||||
#define DM_USE_L2CACHE_ZPOOL "l2cache_zpool"
|
||||
|
||||
/* event */
|
||||
#define DM_EV_NAME "name"
|
||||
#define DM_EV_DTYPE "edtype"
|
||||
#define DM_EV_TYPE "evtype"
|
||||
#define DM_EV_TADD "add"
|
||||
#define DM_EV_TREMOVE "remove"
|
||||
#define DM_EV_TCHANGE "change"
|
||||
|
||||
/* findisks */
|
||||
#define DM_CTYPE "ctype"
|
||||
#define DM_LUN "lun"
|
||||
#define DM_TARGET "target"
|
||||
|
||||
#define NOINUSE_SET getenv("NOINUSE_CHECK") != NULL
|
||||
|
||||
void dm_free_descriptors(dm_descriptor_t *desc_list);
|
||||
void dm_free_descriptor(dm_descriptor_t desc);
|
||||
void dm_free_name(char *name);
|
||||
void dm_free_swapentries(swaptbl_t *);
|
||||
|
||||
dm_descriptor_t *dm_get_descriptors(dm_desc_type_t type, int filter[],
|
||||
int *errp);
|
||||
dm_descriptor_t *dm_get_associated_descriptors(dm_descriptor_t desc,
|
||||
dm_desc_type_t type, int *errp);
|
||||
dm_desc_type_t *dm_get_associated_types(dm_desc_type_t type);
|
||||
dm_descriptor_t dm_get_descriptor_by_name(dm_desc_type_t desc_type,
|
||||
char *name, int *errp);
|
||||
char *dm_get_name(dm_descriptor_t desc, int *errp);
|
||||
dm_desc_type_t dm_get_type(dm_descriptor_t desc);
|
||||
nvlist_t *dm_get_attributes(dm_descriptor_t desc, int *errp);
|
||||
nvlist_t *dm_get_stats(dm_descriptor_t desc, int stat_type,
|
||||
int *errp);
|
||||
void dm_init_event_queue(void(*callback)(nvlist_t *, int),
|
||||
int *errp);
|
||||
nvlist_t *dm_get_event(int *errp);
|
||||
void dm_get_slices(char *drive, dm_descriptor_t **slices,
|
||||
int *errp);
|
||||
void dm_get_slice_stats(char *slice, nvlist_t **dev_stats,
|
||||
int *errp);
|
||||
int dm_get_swapentries(swaptbl_t **, int *);
|
||||
void dm_get_usage_string(char *who, char *data, char **msg);
|
||||
int dm_inuse(char *dev_name, char **msg, dm_who_type_t who,
|
||||
int *errp);
|
||||
int dm_inuse_swap(const char *dev_name, int *errp);
|
||||
int dm_isoverlapping(char *dev_name, char **msg, int *errp);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _LIBDISKMGT_H */
|
||||
#endif /* HAVE_LIBDISKMGT_H */
|
|
@ -0,0 +1,287 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
/*
|
||||
* basic API declarations for share management
|
||||
*/
|
||||
|
||||
#include "zfs_config.h"
|
||||
|
||||
#ifdef HAVE_LIBSHARE
|
||||
#include_next <libshare.h>
|
||||
#else
|
||||
|
||||
#ifndef _LIBSHARE_H
|
||||
#define _LIBSHARE_H
|
||||
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#include <sys/types.h>
|
||||
|
||||
/*
|
||||
* Basic datatypes for most functions
|
||||
*/
|
||||
typedef void *sa_group_t;
|
||||
typedef void *sa_share_t;
|
||||
typedef void *sa_property_t;
|
||||
typedef void *sa_optionset_t;
|
||||
typedef void *sa_security_t;
|
||||
typedef void *sa_protocol_properties_t;
|
||||
typedef void *sa_resource_t;
|
||||
|
||||
typedef void *sa_handle_t; /* opaque handle to access core functions */
|
||||
|
||||
/*
|
||||
* defined error values
|
||||
*/
|
||||
|
||||
#define SA_OK 0
|
||||
#define SA_NO_SUCH_PATH 1 /* provided path doesn't exist */
|
||||
#define SA_NO_MEMORY 2 /* no memory for data structures */
|
||||
#define SA_DUPLICATE_NAME 3 /* object name is already in use */
|
||||
#define SA_BAD_PATH 4 /* not a full path */
|
||||
#define SA_NO_SUCH_GROUP 5 /* group is not defined */
|
||||
#define SA_CONFIG_ERR 6 /* system configuration error */
|
||||
#define SA_SYSTEM_ERR 7 /* system error, use errno */
|
||||
#define SA_SYNTAX_ERR 8 /* syntax error on command line */
|
||||
#define SA_NO_PERMISSION 9 /* no permission for operation */
|
||||
#define SA_BUSY 10 /* resource is busy */
|
||||
#define SA_NO_SUCH_PROP 11 /* property doesn't exist */
|
||||
#define SA_INVALID_NAME 12 /* name of object is invalid */
|
||||
#define SA_INVALID_PROTOCOL 13 /* specified protocol not valid */
|
||||
#define SA_NOT_ALLOWED 14 /* operation not allowed */
|
||||
#define SA_BAD_VALUE 15 /* bad value for property */
|
||||
#define SA_INVALID_SECURITY 16 /* invalid security type */
|
||||
#define SA_NO_SUCH_SECURITY 17 /* security set not found */
|
||||
#define SA_VALUE_CONFLICT 18 /* property value conflict */
|
||||
#define SA_NOT_IMPLEMENTED 19 /* plugin interface not implemented */
|
||||
#define SA_INVALID_PATH 20 /* path is sub-dir of existing share */
|
||||
#define SA_NOT_SUPPORTED 21 /* operation not supported for proto */
|
||||
#define SA_PROP_SHARE_ONLY 22 /* property valid on share only */
|
||||
#define SA_NOT_SHARED 23 /* path is not shared */
|
||||
#define SA_NO_SUCH_RESOURCE 24 /* resource not found */
|
||||
#define SA_RESOURCE_REQUIRED 25 /* resource name is required */
|
||||
#define SA_MULTIPLE_ERROR 26 /* multiple protocols reported error */
|
||||
#define SA_PATH_IS_SUBDIR 27 /* check_path found path is subdir */
|
||||
#define SA_PATH_IS_PARENTDIR 28 /* check_path found path is parent */
|
||||
#define SA_NO_SECTION 29 /* protocol requires section info */
|
||||
#define SA_NO_SUCH_SECTION 30 /* no section found */
|
||||
#define SA_NO_PROPERTIES 31 /* no properties found */
|
||||
#define SA_PASSWORD_ENC 32 /* passwords must be encrypted */
|
||||
|
||||
/* API Initialization */
|
||||
#define SA_INIT_SHARE_API 0x0001 /* init share specific interface */
|
||||
#define SA_INIT_CONTROL_API 0x0002 /* init control specific interface */
|
||||
|
||||
/* not part of API returns */
|
||||
#define SA_LEGACY_ERR 32 /* share/unshare error return */
|
||||
|
||||
/*
|
||||
* other defined values
|
||||
*/
|
||||
|
||||
#define SA_MAX_NAME_LEN 100 /* must fit service instance name */
|
||||
#define SA_MAX_RESOURCE_NAME 255 /* Maximum length of resource name */
|
||||
|
||||
/* Used in calls to sa_add_share() and sa_add_resource() */
|
||||
#define SA_SHARE_TRANSIENT 0 /* shared but not across reboot */
|
||||
#define SA_SHARE_LEGACY 1 /* share is in dfstab only */
|
||||
#define SA_SHARE_PERMANENT 2 /* share goes to repository */
|
||||
|
||||
/* sa_check_path() related */
|
||||
#define SA_CHECK_NORMAL 0 /* only check against active shares */
|
||||
#define SA_CHECK_STRICT 1 /* check against all shares */
|
||||
|
||||
/* RBAC related */
|
||||
#define SA_RBAC_MANAGE "solaris.smf.manage.shares"
|
||||
#define SA_RBAC_VALUE "solaris.smf.value.shares"
|
||||
|
||||
/*
|
||||
* Feature set bit definitions
|
||||
*/
|
||||
|
||||
#define SA_FEATURE_NONE 0x0000 /* no feature flags set */
|
||||
#define SA_FEATURE_RESOURCE 0x0001 /* resource names are required */
|
||||
#define SA_FEATURE_DFSTAB 0x0002 /* need to manage in dfstab */
|
||||
#define SA_FEATURE_ALLOWSUBDIRS 0x0004 /* allow subdirs to be shared */
|
||||
#define SA_FEATURE_ALLOWPARDIRS 0x0008 /* allow parent dirs to be shared */
|
||||
#define SA_FEATURE_HAS_SECTIONS 0x0010 /* protocol supports sections */
|
||||
#define SA_FEATURE_ADD_PROPERTIES 0x0020 /* can add properties */
|
||||
#define SA_FEATURE_SERVER 0x0040 /* protocol supports server mode */
|
||||
|
||||
/*
|
||||
* legacy files
|
||||
*/
|
||||
|
||||
#define SA_LEGACY_DFSTAB "/etc/dfs/dfstab"
|
||||
#define SA_LEGACY_SHARETAB "/etc/dfs/sharetab"
|
||||
|
||||
/*
|
||||
* SMF related
|
||||
*/
|
||||
|
||||
#define SA_SVC_FMRI_BASE "svc:/network/shares/group"
|
||||
|
||||
/* initialization */
|
||||
extern sa_handle_t sa_init(int);
|
||||
extern void sa_fini(sa_handle_t);
|
||||
extern int sa_update_config(sa_handle_t);
|
||||
extern char *sa_errorstr(int);
|
||||
|
||||
/* protocol names */
|
||||
extern int sa_get_protocols(char ***);
|
||||
extern int sa_valid_protocol(char *);
|
||||
|
||||
/* group control (create, remove, etc) */
|
||||
extern sa_group_t sa_create_group(sa_handle_t, char *, int *);
|
||||
extern int sa_remove_group(sa_group_t);
|
||||
extern sa_group_t sa_get_group(sa_handle_t, char *);
|
||||
extern sa_group_t sa_get_next_group(sa_group_t);
|
||||
extern char *sa_get_group_attr(sa_group_t, char *);
|
||||
extern int sa_set_group_attr(sa_group_t, char *, char *);
|
||||
extern sa_group_t sa_get_sub_group(sa_group_t);
|
||||
extern int sa_valid_group_name(char *);
|
||||
|
||||
/* share control */
|
||||
extern sa_share_t sa_add_share(sa_group_t, char *, int, int *);
|
||||
extern int sa_check_path(sa_group_t, char *, int);
|
||||
extern int sa_move_share(sa_group_t, sa_share_t);
|
||||
extern int sa_remove_share(sa_share_t);
|
||||
extern sa_share_t sa_get_share(sa_group_t, char *);
|
||||
extern sa_share_t sa_find_share(sa_handle_t, char *);
|
||||
extern sa_share_t sa_get_next_share(sa_share_t);
|
||||
extern char *sa_get_share_attr(sa_share_t, char *);
|
||||
extern char *sa_get_share_description(sa_share_t);
|
||||
extern sa_group_t sa_get_parent_group(sa_share_t);
|
||||
extern int sa_set_share_attr(sa_share_t, char *, char *);
|
||||
extern int sa_set_share_description(sa_share_t, char *);
|
||||
extern int sa_enable_share(sa_group_t, char *);
|
||||
extern int sa_disable_share(sa_share_t, char *);
|
||||
extern int sa_is_share(void *);
|
||||
|
||||
/* resource name related */
|
||||
extern sa_resource_t sa_find_resource(sa_handle_t, char *);
|
||||
extern sa_resource_t sa_get_resource(sa_group_t, char *);
|
||||
extern sa_resource_t sa_get_next_resource(sa_resource_t);
|
||||
extern sa_share_t sa_get_resource_parent(sa_resource_t);
|
||||
extern sa_resource_t sa_get_share_resource(sa_share_t, char *);
|
||||
extern sa_resource_t sa_add_resource(sa_share_t, char *, int, int *);
|
||||
extern int sa_remove_resource(sa_resource_t);
|
||||
extern char *sa_get_resource_attr(sa_resource_t, char *);
|
||||
extern int sa_set_resource_attr(sa_resource_t, char *, char *);
|
||||
extern int sa_set_resource_description(sa_resource_t, char *);
|
||||
extern char *sa_get_resource_description(sa_resource_t);
|
||||
extern int sa_enable_resource(sa_resource_t, char *);
|
||||
extern int sa_disable_resource(sa_resource_t, char *);
|
||||
extern int sa_rename_resource(sa_resource_t, char *);
|
||||
extern void sa_fix_resource_name(char *);
|
||||
|
||||
/* data structure free calls */
|
||||
extern void sa_free_attr_string(char *);
|
||||
extern void sa_free_share_description(char *);
|
||||
|
||||
/* optionset control */
|
||||
extern sa_optionset_t sa_get_optionset(sa_group_t, char *);
|
||||
extern sa_optionset_t sa_get_next_optionset(sa_group_t);
|
||||
extern char *sa_get_optionset_attr(sa_optionset_t, char *);
|
||||
extern void sa_set_optionset_attr(sa_optionset_t, char *, char *);
|
||||
extern sa_optionset_t sa_create_optionset(sa_group_t, char *);
|
||||
extern int sa_destroy_optionset(sa_optionset_t);
|
||||
extern sa_optionset_t sa_get_derived_optionset(void *, char *, int);
|
||||
extern void sa_free_derived_optionset(sa_optionset_t);
|
||||
|
||||
/* property functions */
|
||||
extern sa_property_t sa_get_property(sa_optionset_t, char *);
|
||||
extern sa_property_t sa_get_next_property(sa_group_t);
|
||||
extern char *sa_get_property_attr(sa_property_t, char *);
|
||||
extern sa_property_t sa_create_section(char *, char *);
|
||||
extern void sa_set_section_attr(sa_property_t, char *, char *);
|
||||
extern sa_property_t sa_create_property(char *, char *);
|
||||
extern int sa_add_property(void *, sa_property_t);
|
||||
extern int sa_update_property(sa_property_t, char *);
|
||||
extern int sa_remove_property(sa_property_t);
|
||||
extern int sa_commit_properties(sa_optionset_t, int);
|
||||
extern int sa_valid_property(void *, char *, sa_property_t);
|
||||
extern int sa_is_persistent(void *);
|
||||
|
||||
/* security control */
|
||||
extern sa_security_t sa_get_security(sa_group_t, char *, char *);
|
||||
extern sa_security_t sa_get_next_security(sa_security_t);
|
||||
extern char *sa_get_security_attr(sa_optionset_t, char *);
|
||||
extern sa_security_t sa_create_security(sa_group_t, char *, char *);
|
||||
extern int sa_destroy_security(sa_security_t);
|
||||
extern void sa_set_security_attr(sa_security_t, char *, char *);
|
||||
extern sa_optionset_t sa_get_all_security_types(void *, char *, int);
|
||||
extern sa_security_t sa_get_derived_security(void *, char *, char *, int);
|
||||
extern void sa_free_derived_security(sa_security_t);
|
||||
|
||||
/* protocol specific interfaces */
|
||||
extern int sa_parse_legacy_options(sa_group_t, char *, char *);
|
||||
extern char *sa_proto_legacy_format(char *, sa_group_t, int);
|
||||
extern int sa_is_security(char *, char *);
|
||||
extern sa_protocol_properties_t sa_proto_get_properties(char *);
|
||||
extern uint64_t sa_proto_get_featureset(char *);
|
||||
extern sa_property_t sa_get_protocol_section(sa_protocol_properties_t, char *);
|
||||
extern sa_property_t sa_get_next_protocol_section(sa_property_t, char *);
|
||||
extern sa_property_t sa_get_protocol_property(sa_protocol_properties_t, char *);
|
||||
extern sa_property_t sa_get_next_protocol_property(sa_property_t, char *);
|
||||
extern int sa_set_protocol_property(sa_property_t, char *, char *);
|
||||
extern char *sa_get_protocol_status(char *);
|
||||
extern void sa_format_free(char *);
|
||||
extern sa_protocol_properties_t sa_create_protocol_properties(char *);
|
||||
extern int sa_add_protocol_property(sa_protocol_properties_t, sa_property_t);
|
||||
extern int sa_proto_valid_prop(char *, sa_property_t, sa_optionset_t);
|
||||
extern int sa_proto_valid_space(char *, char *);
|
||||
extern char *sa_proto_space_alias(char *, char *);
|
||||
extern int sa_proto_get_transients(sa_handle_t, char *);
|
||||
extern int sa_proto_notify_resource(sa_resource_t, char *);
|
||||
extern int sa_proto_change_notify(sa_share_t, char *);
|
||||
extern int sa_proto_delete_section(char *, char *);
|
||||
|
||||
/* handle legacy (dfstab/sharetab) files */
|
||||
extern int sa_delete_legacy(sa_share_t, char *);
|
||||
extern int sa_update_legacy(sa_share_t, char *);
|
||||
extern int sa_update_sharetab(sa_share_t, char *);
|
||||
extern int sa_delete_sharetab(sa_handle_t, char *, char *);
|
||||
|
||||
/* ZFS functions */
|
||||
extern int sa_zfs_is_shared(sa_handle_t, char *);
|
||||
extern int sa_group_is_zfs(sa_group_t);
|
||||
extern int sa_path_is_zfs(char *);
|
||||
|
||||
/* SA Handle specific functions */
|
||||
extern sa_handle_t sa_find_group_handle(sa_group_t);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _LIBSHARE_H */
|
||||
#endif /* HAVE_LIBSHARE */
|
|
@ -0,0 +1,35 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include_next <mntent.h>
|
||||
|
||||
#ifndef _PORT_MNTENT_H
|
||||
#define _PORT_MNTENT_H
|
||||
|
||||
/* For HAVE_SETMNTENT */
|
||||
#include "zfs_config.h"
|
||||
|
||||
#endif
|
|
@ -0,0 +1,38 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include_next <stdlib.h>
|
||||
|
||||
#ifndef _PORT_STDLIB_H
|
||||
#define _PORT_STDLIB_H
|
||||
|
||||
#include "zfs_config.h"
|
||||
|
||||
#ifndef HAVE_GETEXECNAME
|
||||
extern const char *getexecname();
|
||||
#endif
|
||||
|
||||
#endif
|
|
@ -0,0 +1,46 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include_next <string.h>
|
||||
|
||||
#ifndef _PORT_STRING_H
|
||||
#define _PORT_STRING_H
|
||||
|
||||
#include "zfs_config.h"
|
||||
|
||||
#ifndef HAVE_STRLCPY
|
||||
extern size_t strlcpy(char *dst, const char *src, size_t len);
|
||||
#endif
|
||||
|
||||
#ifndef HAVE_STRLCAT
|
||||
extern size_t strlcat(char *, const char *, size_t);
|
||||
#endif
|
||||
|
||||
#ifndef HAVE_STRNLEN
|
||||
extern size_t strnlen(const char *src, size_t maxlen);
|
||||
#endif
|
||||
|
||||
#endif
|
|
@ -0,0 +1,38 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License, Version 1.0 only
|
||||
* (the "License"). You may not use this file except in compliance
|
||||
* with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2007 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include_next <strings.h>
|
||||
|
||||
#ifndef _PORT_STRINGS_H
|
||||
#define _PORT_STRINGS_H
|
||||
|
||||
#include "zfs_config.h"
|
||||
|
||||
#ifndef HAVE_STRCMP_IN_STRINGS_H
|
||||
#include <string.h>
|
||||
#endif
|
||||
|
||||
#endif
|
|
@ -0,0 +1,37 @@
|
|||
/*
|
||||
* CDDL HEADER START
|
||||
*
|
||||
* The contents of this file are subject to the terms of the
|
||||
* Common Development and Distribution License (the "License").
|
||||
* You may not use this file except in compliance with the License.
|
||||
*
|
||||
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
||||
* or http://www.opensolaris.org/os/licensing.
|
||||
* See the License for the specific language governing permissions
|
||||
* and limitations under the License.
|
||||
*
|
||||
* When distributing Covered Code, include this CDDL HEADER in each
|
||||
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
||||
* If applicable, add the following below this CDDL HEADER, with the
|
||||
* fields enclosed by brackets "[]" replaced with your own identifying
|
||||
* information: Portions Copyright [yyyy] [name of copyright owner]
|
||||
*
|
||||
* CDDL HEADER END
|
||||
*/
|
||||
/*
|
||||
* Copyright 2008 Sun Microsystems, Inc. All rights reserved.
|
||||
* Use is subject to license terms.
|
||||
*/
|
||||
|
||||
#include_next <stropts.h>
|
||||
|
||||
#ifndef _PORT_STROPTS_H
|
||||
#define _PORT_STROPTS_H
|
||||
|
||||
#include "zfs_config.h"
|
||||
|
||||
#ifdef HAVE_IOCTL_IN_STROPTS_H
|
||||
#include <fake_ioctl.h>
|
||||
#endif
|
||||
|
||||
#endif /* _PORT_STROPTS_H */
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue