Commit Graph

7469 Commits

Author SHA1 Message Date
Brian Behlendorf 483106eb71 Minimize ztest stack frame size
To ensure ztest behaves as similarly as possible to the kernel
implementation of ZFS we attempt to honor the kernel stack limits.
This includes keeping the individual stack frame sizes under 1K
in size.  We currently use gcc to detect and enforce this limit.

Therefore to get this building cleanly with full debugging enabled
the stack usage in the following functions has been reduced by
moving the buffer to the heap.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-10-04 13:19:09 -07:00
Etienne Dechamps 9d81146b01 Use dynamic file descriptor numbers in ztest.
Currently, ztest expects to get 3 and 4 as the file descriptors for
data and random files, respectively. This is quite fragile and breaks
easily if ztest is run with these file descriptors already opened
(e.g. in a complex shell script).

This patch fixes the issue by removing the assumptions on the file
descriptor numbers that open() returns.

For the random file (/dev/urandom), the new code doesn't rely on a
shared file descriptor; instead, it reopens the file in the child.

For the data file, the new code writes the file descriptor number into
a "ZTEST_FD_DATA" environment variable so that it can be recovered
after the execv() call.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-10-04 13:19:09 -07:00
Christopher Siden 22257dc0d5 Fix mmap() usage in ztest.
illumos/illumos-gate@ad135b5d64
Illumos changeset: 13700:2889e2596bd6

Note that this is only a partial port of the aforementioned Illumos
changeset.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

Ported to zfsonlinux by: Etienne Dechamps <etienne.dechamps@ovh.net>
2012-10-04 13:19:09 -07:00
Chris Siden c242c188fd Illumos #1950: ztest backwards compatibility testing option.
illumos/illumos-gate@420dfc9585
Illumos changeset: 13571:a5771a96228c

1950 ztest backwards compatibility testing option

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Eric Schrock <eric.schrock@delphix.com>

Ported-by: Etienne Dechamps <etienne.dechamps@ovh.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-10-04 13:18:53 -07:00
Chris Dunlop d75d6f294e Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fc for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1002
2012-10-04 10:44:09 -07:00
Matthew Ahrens 04434775b7 Illumos #3100: zvol rename fails with EBUSY when dirty.
illumos/illumos-gate@2e2c135528
Illumos changeset: 13780:6da32a929222

3100 zvol rename fails with EBUSY when dirty

Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Adam H. Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <eric.schrock@delphix.com>

Ported-by: Etienne Dechamps <etienne.dechamps@ovh.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #995
2012-10-03 13:59:02 -07:00
Richard Lowe 0677cb6f52 Illumos #2399: zfs manual page does not document use of "zfs diff".
illumos/illumos-gate@3b8be6bf4f
Illumos changeset: 13773:00c2a08cf1bb

2399 zfs manual page does not document use of "zfs diff"

Reviewed by: Joshua M. Clulow <josh@sysmgr.org>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>

Ported-by: Etienne Dechamps <etienne.dechamps@ovh.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #940
2012-10-03 13:59:02 -07:00
George Wilson 65947351e7 Illumos #3129, #3130
illumos/illumos-gate@d6afdce20f
Illumos changeset: 13794:7c5e0e746b2c

3129 'zpool reopen' restarts resilvers
3130 ztest failure: Assertion failed:
     0 == dmu_objset_destroy(name, B_FALSE) (0x0 == 0x10)

Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
  https://www.illumos.org/issues/3129
  https://www.illumos.org/issues/3130

Ported by: Etienne Dechamps <etienne.dechamps@ovh.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #994
2012-10-03 13:59:02 -07:00
Etienne Dechamps d135245791 Temporarily disable the reguid test.
Currently, ztest fails with the following error:

    error: Pool 'ztest' has encountered an uncorrectable I/O failure
    and the failure mode property for this pool is set to panic.

We know how to fix it (see issue #939), but it may take some time
before we get around to merging the fix, which has some heavy
dependencies.

In the mean time, it is not ideal to be unable to use ztest just
because of a small isolated issue, so this patch works around the
problem by disabling the reguid test. This is just a temporary hack to
keep ztest usable.

The reguid test will be enabled again when the proper fix is merged.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #997
2012-10-03 13:59:02 -07:00
Etienne Dechamps 6aec1cd5a6 Fix ztest vdev file paths.
Currently, in several instances (but not all), ztest generates vdev
file paths using a statement similar to this:

    snprintf(path, sizeof (path), ztest_dev_template, ...);

This worked fine until 40b84e7aec, which
changed path to be a pointer to the heap instead of an array allocated
on the stack. Before this change, sizeof(path) would return the size of
the array; now, it returns the size of the pointer instead.

As a result, the aforementioned sprintf statement uses the wrong size
and truncates the vdev file path to the first 4 or 8 bytes (depending
on the architecture). Typically, with default settings, the file path
will become "/tmp/zt" instead of "/test/ztest.XXX".

This issue only exists in ztest_vdev_attach_detach() and
ztest_fault_inject(), which explains why ztest doesn't fail right away.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #989
2012-10-03 13:32:48 -07:00
Etienne Dechamps 274091c074 Fix VOP_CLOSE() in userspace.
Currently, for unknown reasons, VOP_CLOSE() is a no-op in userspace.
This causes file descriptor leaks. This is especially problematic with
long ztest runs, since zpool.cache is opened repeatedly and never
closed, resulting in resource exhaustion (EMFILE errors).

This patch fixes the issue by making VOP_CLOSE() do what it is supposed
to do.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #989
2012-10-03 13:32:48 -07:00
Etienne Dechamps 0aebd4f9e3 Create threads in detached state in userspace.
Currently, thread_create(), when called in userspace, creates a
joinable (i.e. not detached thread). This is the pthread default.

Unfortunately, this does not reproduce kthreads behavior (kthreads
are always detached). In addition, this contradicts the original
Solaris code which creates userspace threads in detached mode.

These joinable threads are never joined, which leads to a leakage of
pthread thread objects ("zombie threads"). This in turn results in
excessive ressource consumption, and possible ressource exhaustion in
extreme cases (e.g. long ztest runs).

This patch fixes the issue by creating userspace threads in detached
mode. The only exception is ztest worker threads which are meant to be
joinable.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #989
2012-10-03 13:32:48 -07:00
Brian Behlendorf 6d1d976b2c Modify vdev_elevator_switch() to use elevator_change()
As of Linux 2.6.36 an elevator_change() interface was added.
This commit updates vdev_elevator_switch() to use this interface
when available, otherwise it falls back to the usermodehelper
method.

Original-patch-by: foobarz <sysop@xeon.(none)>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #906
2012-10-03 13:31:44 -07:00
Etienne Dechamps 2f342404c1 Force 4K blocksize when testing ext2 on zvol.
Currently, mkfs.ext2 on zconfig.sh zvols tries to use a 8K blocksize,
probably because by default zvol exposes an optimal I/O size of 8K.

Unfortunately, a ext2 blocksize of 8K is not supported by the kernel,
so the resulting filesystem is unmountable.

This patch fixes the issue by making sure the blocksize is 4K. We have
to use -F to force it else mkfs.ext2 won't allow us to use a blocksize
smaller than the optimal I/O size.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #979
2012-10-03 10:52:51 -07:00
Cyril Plisko 393b44c711 Implement .commit_metadata hook for NFS export
In order to implement synchronous NFS metadata semantics ZFS
needs to provide the .commit_metadata hook.  All it takes there
is to make sure changes are committed to ZIL.  Fortunately
zfs_fsync() does just that, so simply calling it from
zpl_commit_metadata() does the trick.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #969
2012-10-03 10:49:45 -07:00
Chris Wedgwood 23a61ccc1b zvol_probe should return NULL when the device isn't found.
Previously we returned ERR_PTR(-ENOENT) which the rest of the kernel
doesn't expect and as such we can oops.

Signed-off-by: Chris Wedgwood <cw@f00f.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #949
Closes #931
Closes #789
Closes #743
Closes #730
2012-10-03 10:39:12 -07:00
Bill Pijewski 37abac6d55 Illumos #2703: add mechanism to report ZFS send progress
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  https://www.illumos.org/issues/2703

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-19 13:39:06 -07:00
Chris Siden 1bd201e70d Illumos #1948: zpool list should show more detailed pool info
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <eric.schrock@delphix.com>

References:
  https://www.illumos.org/issues/1948

Ported by:	Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #685
2012-09-19 13:39:05 -07:00
Brian Behlendorf 95fd8c9a7f Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fca08
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #973
2012-09-19 11:52:36 -07:00
Brian Behlendorf 0a2f7b3662 Seg fault 'zpool import -d /dev/disk/by-id -a'
Introduced by commit 44867b6d6e.
We should of course check to ensure best isn't NULL before
attempting to dereference it.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #974
2012-09-18 12:33:37 -07:00
Brian Behlendorf 211204bed3 zfs-0.6.0-rc11 2012-09-18 11:30:24 -07:00
Brian Behlendorf a6c6839a88 SPL 0.6.0-rc11 2012-09-18 11:28:57 -07:00
Richard Lowe dd4769adc0 Illumos #2088 zdb could use a reasonable manual page
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Steve Gonczi <gonczi@comcast.net>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
      https://www.illumos.org/issues/2088

Ported by: Cyril Plisko <cyril.plisko@mountall.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #682
2012-09-18 09:09:13 -07:00
Brian Behlendorf 44867b6d6e Improve `zpool import` search behavior
The goal of this change is to make 'zpool import' prefer to use
the peristent /dev/mapper or /dev/disk/by-* paths.  These are far
preferable to the devices in /dev/ whos names are not persistent
and are determined by the order in which a device is detected.

This patch improves things by changing the default search path from
just to the top level /dev/ directory to (in order):

  /dev/disk/by-vdev   - Custom rules, use first if they exist
  /dev/disk/zpool     - Custom rules, use first if they exist
  /dev/mapper         - Use multipath devices before components
  /dev/disk/by-uuid   - Single unique entry and persistent
  /dev/disk/by-id     - May be multiple entries and persistent
  /dev/disk/by-path   - Encodes physical location and persistent
  /dev/disk/by-label  - Custom persistent labels
  /dev                - UNSAFE device names will change

The default search path can be overriden by setting the
ZPOOL_IMPORT_PATH environment variable.  This must be a colon
delimited list of paths which are searched for vdevs.  If the
'zpool import -d' option is specified only those listed paths
will be searched.

Finally, when multiple paths to the same device are found.  If one
of the paths is an exact match for the path used last time to import
the pool it will be used.  When there are no exact matches the
prefered path will be determined by the provided search order.

This means you can still import a pool and force specific names by
providing the -d <path> option.  And the prefered names will persist
as long as those paths exist on your system.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #965
2012-09-17 13:49:07 -07:00
Brian Behlendorf ba367276d8 Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fca08
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
2012-09-17 11:22:23 -07:00
Cyril Plisko 49d39798f2 ZFS replay transaction error 5
When zfs_replay_write() replays TX_WRITE records from ZIL
it calls zpl_write_common() to perform the actual write.
zpl_write_common() returns the number of bytes written
(similar to write() system call) or an (negative) error.
However, the code expects the positive return value to be
a residual counter. Thus when zpl_write_common() successfully
completes it is mistakenly considered to be a partial write and
the error code delivered further. At this point the ZIL processing
is aborted with famous "ZFS replay transaction error 5" error
message given to the message buffer.

The fix is to compare the zpl_write_commmon() return value with
the buffer size and flag error only when they disagree.

Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #933
2012-09-17 11:06:58 -07:00
Brian Behlendorf 8312c6df55 Clear PG_writeback for sync I/O error case
Commit 2b2861362f accidentally
introduced this issue by only conditionally registering the
commit callback in the async case.

The error handing code for the dmu_tx_assign() failure case
relied on there always being a registered commit callback to
clear the PG_writeback bit.  Since that is no longer strictly
true for the synchronous case we must explicitly invoke the
callback.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #961
2012-09-14 15:53:47 -07:00
Cyril Plisko 8e8e7f35b7 Fix zdb printf format string for ZIL data blocks
Without this fix the zdb printouts of ZIL data blocks look full of FF
due to printf() handling its arguments as int by default.

Here is the output before the fix

                TX_WRITE            len   4136, txg 1093817, seq 149231
                        foid 4242, offset 0, length f68
                        G FFFFFF8EFFFFFF87FFFFFF91FFFFFFCC 1c
FFFFFFAFFFFFFFC9FFFFFFBAZ FFFFFFC3

And the same after the fix

                TX_WRITE            len   4136, txg 1093817, seq 149231
                        foid 4242, offset 0, length f68
                        G 8E8791CC 1cAFC9BAZ C3

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #962
2012-09-13 09:02:12 -07:00
Brian Behlendorf 5915791096 Move iput() after zfs_inode_update()
When replaying an unlink/remove operation via zfs_rmdir() the object
being removed will be instantiated by a call to zfs_dirent_lock().
This means that there is a single reference protecting the object.
Right before the call to zfs_inode_update() this reference is dropped
which may cause the object to be destroyed.  This will result in a
NULL dereference as shown by the stack trace is issue #782.

This likely isn't an issue during normal operation because there is
always an additional reference held on the object by the VFS.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #782
2012-09-12 14:22:52 -07:00
Brian Behlendorf 3050c9314f Switch KM_SLEEP to KM_PUSHPAGE
Under certain circumstances the following functions may be called
in a context where KM_SLEEP is unsafe and can result in a deadlocked
system.  To avoid this problem the unconditional KM_SLEEPs are
converted to KM_PUSHPAGEs.  This will prevent them from attempting
to initiate any I/O during direct reclaim.

This change was originally part of cd5ca4b but was reverted by
330fe01.  It always should have had its own commit for exactly
this reason.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-12 12:27:09 -07:00
Brian Behlendorf 9b51f21841 Remove TQ_SLEEP -> KM_SLEEP mapping
When the taskq code was originally written it seemed like a good
idea to simply map TQ_SLEEP to KM_SLEEP.  Unfortunately, this
assumed that the TQ_* flags would never confict with any of the
Linux GFP_* flags.  When adding the TQ_PUSHPAGE support in commit
cd5ca4b this invariant was accidentally broken.

Therefore to support TQ_PUSHPAGE, which is needed for Linux, and
prevent any further confusion I have removed this direct mapping.
The TQ_SLEEP, TQ_NOSLEEP, and TQ_PUSHPAGE are no longer defined
in terms of their KM_* counterparts.  Instead a simple mapping
function is introduce to convert TQ_* -> KM_* where needed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #171
2012-09-12 11:41:42 -07:00
Brian Behlendorf 330fe010e4 Revert "Switch KM_SLEEP to KM_PUSHPAGE"
This reverts commit cd5ca4b2f8
due to conflicts in the higher TQ_ bits which caused incorrect
behavior.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-12 10:07:48 -07:00
Brian Behlendorf cda4db408c Revert "Improve AF hard disk detection"
This reverts commit 395350c85d which
accidentally introduced issue #955.

Pools using AF drives which were originally created with a sector
size of 512 bytes will now be correctly detected to have physical
sector size of 4096.  This is desirable for a new pool, however for
an existing pool abruptly changing the sector size causes problems.

For this reason, this change is being reverted until the additional
logic can be added to detect the existing pool case.  Existing
pools must use the ashift size stored in the label regardless of
what the disk reports.  This is critical for compatibility.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #955
2012-09-11 16:33:49 -07:00
Cyril Plisko 27ccd4147b Avoid running exportfs on each zfs/zpool command invocation
Delay executing exportfs command until its results are actually
required.

Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com>
Signed-off-by: Gunnar Beutner <gunnar@beutner.name>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-11 10:21:49 -07:00
Cyril Plisko af909a1089 Illumos #3064: usr/src/cmd/zpool/zpool_main.c misspells "successful"
Reviewed by: Andrew Stormont <Andrew.Stormont@nexenta.com>
Reviewed by: Kartik Mistry <kartik.mistry@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>

References:
      https://www.illumos.org/issues/3064

Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-11 10:19:50 -07:00
Chris Dunlop fff276419e Remove autotools products
spl_config.h.in is a generated file: remove and .gitignore it

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-11 10:15:13 -07:00
Chris Dunlop dd87332f47 Remove autotools products
spl_config.h.in is a generated file: remove and .gitignore it

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-11 10:12:47 -07:00
Etienne Dechamps b815ff9a8f Silence "setting dataset to sync always" message in ztest.
ztest outputs a message when testing sync=always no matter what the
verbosity level is. There is no point outputting this message for low
verbosity levels.

With this patch the message is only displayed at verbosity level 5 or
above. The result is less output pollution.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #951
2012-09-10 10:55:44 -07:00
Brian Behlendorf 4ca9a43644 Remove zvol device node
The 'zfs destroy' changes in 330d06f disrupted how zvol devices
get removed on ZoL.  However, it basically boils down to the
fact that we are no longer reliably calling zvol_remove_minor()
via zfs_ioc_destroy_snaps().

Therefore we add the missing call and handle things similarly
to the existing zfs_unmount_snap() case.  Ideally we would check
if this is of type DMU_OST_ZFS or DMU_OST_ZVOL and just do the
right thing as in zfs_ioc_destroy().  However, it looks like
it would be fairly expensive to get the type, and it's harmless
to simply attempt the umount and minor removal.

This is also an issue in the latest FreeBSD and Illumos code.
It was being tracked under the following issue, and we may want
to refresh our code when they settle on what they want to do
about it upstream.

  https://www.illumos.org/issues/3170

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #903
2012-09-10 10:25:08 -07:00
Brian Behlendorf 3c60f5054c Debug cv_destroy() with mutex held
There still appears to be a race in the condition variables where
->cv_mutex is set after we are woken from the cv_destroy wait queue.
This might be possible when cv_destroy() is called immediately after
cv_broadcast().  We had some troubles with this previously but
there may still be a small race, see commit d599e4f.

The following patch closes one small race and improves the ASSERTs
such that they log the offending value.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
zfsonlinux/zfs#943
2012-09-10 10:23:26 -07:00
Brian Behlendorf 95331f4437 Set KMC_NOEMERGENCY for zlib workspaces
The workspace required by zlib to perform compression is roughly
512MB (order-7).  These allocations are so large that we should
never attempt to directly kmalloc an emergency object for them.

It is far preferable to asynchronously vmalloc an additional slab
in case it's needed.  Then simply block waiting for an existing
object to be released or for the new slab to be allocated.

This can be accomplished by disabling emergency slab objects by
passing the KMC_NOEMERGENCY flag at slab creation time.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
zfsonlinux/zfs#917
2012-09-07 14:36:26 -07:00
Brian Behlendorf cb5c2acebb Add KMC_NOEMERGENCY slab flag
Provide a flag to disable the use of emergency objects for a
specific kmem cache.  There may be instances where under no
circumstances should you kmalloc() an emergency object.  For
example, when you cache contains very large objects (>128k).

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-07 14:27:03 -07:00
Brian Behlendorf 1ecc6d1265 Add zstreamdump .gitignore
When zstreamdump was merged in commit b79fc3f we failed to add
the needed .gitignore file.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-06 14:23:11 -07:00
Cyril Plisko 04f9432d3b Make ZFS filesystem id persistent across different machines
Use ZFS dataset fsid guid as a unique file system id, similar to what is
done on Illumos/OpenSolaris.

Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #888
2012-09-06 12:47:11 -07:00
Etienne Dechamps 4b2f65b253 Increase the stack space in userspace.
In 1e33ac1e26, the maximum stack size for
userspace tools was set to 8k to mimic the available kernel stack size.

Unfortunately, due to differences in how the stack is used in userspace
vs kernel space, spurious stack overflows could occur in userspace
tools due to the limited stack size. This is especially true in ztest
when debugging is enabled.

This patch multiplies the userspace stack size by 4, which fixes the
stack overflow issues. This comes at the price of not being able to
catch stack size issues in userspace, but the previous solution proved
unreliable anyway.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Fixes #934.
2012-09-06 11:59:59 -07:00
Brian Behlendorf ebcfc8a534 Disable page allocation warnings for ARC buffers
Buffers for the ARC are normally backed by the SPL virtual slab.
However, if memory is low, AND no slab objects are available,
AND a new slab cannot be quickly constructed a new emergency
object will be directly allocated.

These objects can be as large as order 5 on a system with 4k
pages.  And because they are allocated with KM_PUSHPAGE, to
avoid a potential deadlock, they are not allowed to initiate I/O
to satisfy the allocation.  This can result in the occasional
allocation failure.

However, since these allocations are allowed to block and
perform operations such as memory compaction they will eventually
succeed.  Since this is not unexpected (just unlikely) behavior
this patch disables the warning for the allocation failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #465
2012-09-06 11:53:08 -07:00
Michael Martin fc24f7c887 Fix missing vdev names in zpool status output
Commit 858219c makes more sense down below in the 'if (verbose)'
section of the code.  Initially, buf and path will never point
to the same location.  Once 'path = buf' is set on a raidz vdev,
the code may drop into the verbose section depending on the
verbose flag.  In here, using a tmpbuf makes sense since now
'buf == path'.

This issue does not occur in the upstream Solaris code because
their implementations of snprintf() allow for buf and path to
be the same address.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #57
2012-09-05 22:09:12 -07:00
Brian Behlendorf cafa9709f3 Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fca08
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
2012-09-05 08:44:58 -07:00
Brian Behlendorf 0ef0ff546e Switch KM_SLEEP to KM_PUSHPAGE
This warning indicates the incorrect use of KM_SLEEP in a call
path which must use KM_PUSHPAGE to avoid deadlocking in direct
reclaim.  See commit b8d06fca08
for additional details.

  SPL: Fixing allocation for task txg_sync (6093) which
  used GFP flags 0x297bda7c with PF_NOFS set

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #917
2012-09-04 16:00:06 -07:00
Brian Behlendorf 395350c85d Improve AF hard disk detection
Use the bdev_physical_block_size() interface to determine the
minimize write size which can be issued without incurring a
read-modify-write operation.  This is used to set the ashift
correctly to prevent a performance penalty when using AF hard
disks.

Unfortunately, this interface isn't entirely reliable because
it's not uncommon for disks to misreport this value.  For this
reason you may still need to manually set your ashift with:

  zpool create -o ashift=12 ...

The solution to this in the upstream Illumos source was to add
a while list of known offending drives.  Maintaining such a list
will be a burden, but it still may be worth doing if we can
detect a large number of these drives.  This should be considered
as future work.

Reported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #916
2012-09-04 15:35:32 -07:00