Commit Graph

236 Commits

Author SHA1 Message Date
Richard Yao 677c6f8457
btree: Implement faster binary search algorithm
This implements a binary search algorithm for B-Trees that reduces
branching to the absolute minimum necessary for a binary search
algorithm. It also enables the compiler to inline the comparator to
ensure that the only slowdown when doing binary search is from waiting
for memory accesses. Additionally, it instructs the compiler to unroll
the loop, which gives an additional 40% improve with Clang and 8%
improvement with GCC.

Consumers must opt into using the faster algorithm. At present, only
B-Trees used inside kernel code have been modified to use the faster
algorithm.

Micro-benchmarks suggest that this can improve binary search performance
by up to 3.5 times when compiling with Clang 16 and up to 1.9 times when
compiling with GCC 12.2.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14866
2023-05-26 10:03:12 -07:00
Matthew Ahrens 3095ca91c2
Verify block pointers before writing them out
If a block pointer is corrupted (but the block containing it checksums
correctly, e.g. due to a bug that overwrites random memory), we can
often detect it before the block is read, with the `zfs_blkptr_verify()`
function, which is used in `arc_read()`, `zio_free()`, etc.

However, such corruption is not typically recoverable.  To recover from
it we would need to detect the memory error before the block pointer is
written to disk.

This PR verifies BP's that are contained in indirect blocks and dnodes
before they are written to disk, in `dbuf_write_ready()`. This way,
we'll get a panic before the on-disk data is corrupted. This will help
us to diagnose what's causing the corruption, as well as being much
easier to recover from.

To minimize performance impact, only checks that can be done without
holding the spa_config_lock are performed.

Additionally, when corruption is detected, the raw words of the block
pointer are logged.  (Note that `dprintf_bp()` is a no-op by default,
but if enabled it is not safe to use with invalid block pointers.)

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14817
2023-05-08 11:20:23 -07:00
Brian Behlendorf dd19821149
zdb: consistent xattr output
When using zdb to output the value of an xattr only interpret it
as printable characters if the entire byte array is printable.
Additionally, if the --parseable option is set always output the
buffer contents as octal for easy parsing.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14830
2023-05-08 11:17:41 -07:00
Brian Behlendorf d960beca61
zdb: Fix minor memory leak
Commit 6b6aaf6dc2 introduced a small
memory leak in zdb.  This was detected by the LeakSanitizer and was
causing all ztest runs to fail.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14796
2023-04-26 08:43:39 -07:00
Rich Ercolani 6b6aaf6dc2
Taught zdb -bb to print metadata totals
People often want estimates of how much of their pool is occupied
by metadata, but they end up using lots of text processing on zdb's
output to get it.

So let's just...provide it for them.

Now, zdb -bbbs will output something like:

Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
[...]
    68  1.06M    272K    544K      8K    4.00     0.00      L6 Total
 1.71K   212M   6.85M   13.7M      8K   30.91     0.00      L5 Total
 1.71K   212M   6.85M   13.7M      8K   30.91     0.00      L4 Total
 1.73K   214M   6.92M   13.8M      8K   30.89     0.00      L3 Total
 18.7K  2.29G    111M    221M   11.8K   21.19     0.00      L2 Total
 3.56M   454G   28.4G   56.9G   16.0K   15.97     0.19      L1 Total
  308M  36.8T   28.2T   28.6T   95.1K    1.30    99.80      L0 Total
  311M  37.3T   28.3T   28.6T   94.2K    1.32   100.00  Total
 50.4M   774G    113G    291G   5.77K    6.85     0.99  Metadata Total

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14746
2023-04-24 16:55:07 -07:00
rob-wing 3e4ed4213d
Create zap for root vdev
And add it to the AVZ, this is not backwards compatible with older pools
due to an assertion in spa_sync() that verifies the number of ZAPs of
all vdevs matches the number of ZAPs in the AVZ.

Granted, the assertion only applies to #DEBUG builds - still, a feature
flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2

Notably, this allows to get/set properties on the root vdev:

    % zpool set user:prop=value <pool> root-0

Before this commit, it was already possible to get/set properties on
top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0):

    % zpool set user:prop=value <pool> mirror-0

This syntax also applies to the root vdev as it is is of type 'root'
with a vdev_id of 0, root-0. The keyword 'root' as an alias for
'root-0'.

The following tests have been added:

    - zpool get all properties from root vdev
    - zpool set a property on root vdev
    - verify root vdev ZAP is created

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14405
2023-04-20 10:07:56 -07:00
Richard Yao d1807f168e nvpair: Constify string functions
After addressing coverity complaints involving `nvpair_name()`, the
compiler started complaining about dropping const. This lead to a rabbit
hole where not only `nvpair_name()` needed to be constified, but also
`nvpair_value_string()`, `fnvpair_value_string()` and a few other static
functions, plus variable pointers throughout the code. The result became
a fairly big change, so it has been split out into its own patch.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:50 -07:00
Richard Yao 37edc7ea98 Refactor loop in dump_histogram()
The current loop triggers a complaint that we are using an array offset
prior to a range check from cpp/offset-use-before-range-check when we
are actually calculating maximum and minimum values. I was about to file
a false positive report with CodeQL, but after looking at how the code
is structured, I really cannot blame CodeQL for mistaking this for a
range check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:20 -08:00
Alexander Motin a8d83e2a24
More adaptive ARC eviction
Traditionally ARC adaptation was limited to MRU/MFU distribution.  But
for years people with metadata-centric workload demanded mechanisms to
also manage data/metadata distribution, that in original ZFS was just
a FIFO.  As result ZFS effectively got separate states for data and
metadata, minimum and maximum metadata limits etc, but it all required
manual tuning, was not adaptive and in its heart remained a bad FIFO.

This change removes most of existing eviction logic, rewriting it from
scratch.  This makes MRU/MFU adaptation individual for data and meta-
data, same as the distribution between data and metadata themselves.
Since most of required states separation was already done, it only
required to make arcs_size state field specific per data/metadata.

The adaptation logic is still based on previous concept of ghost hits,
just now it balances ARC capacity between 4 states: MRU data, MRU
metadata, MFU data and MFU metadata.  To simplify arc_c changes instead
of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd
and arc_pm, representing ARC balance between metadata and data, MRU and
MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed
point fractions.  Since we care about the math result only when need to
evict, this moves all the logic from arc_adapt() to arc_evict(), that
reduces per-block overhead, since per-block operations are limited to
stats collection, now moved from arc_adapt() to arc_access() and using
cheaper wmsums.  This also allows to remove ugly ARC_HDR_DO_ADAPT flag
from many places.

This change also removes number of metadata specific tunables, part of
which were actually not functioning correctly, since not all metadata
are equal and some (like L2ARC headers) are not really evictable.
Instead it introduced single opaque knob zfs_arc_meta_balance, tuning
ARC's reaction on ghost hits, allowing administrator give more or less
preference to metadata without setting strict limits.

Some of old code parts like arc_evict_meta() are just removed, because
since introduction of ABD ARC they really make no sense: only headers
referenced by small number of buffers are not evictable, and they are
really not evictable no matter what this code do.  Instead just call
arc_prune_async() if too much metadata appear not evictable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14359
2023-03-08 11:17:23 -08:00
Rob N b988f32c70
Better handling for future crypto parameters
The intent is that this is like ENOTSUP, but specifically for when
something can't be done because we have no support for the requested
crypto parameters; eg unlocking a dataset or receiving a stream
encrypted with a suite we don't support.

Its not intended to be recoverable without upgrading ZFS itself.
If the request could be made to work by enabling a feature or modifying
some other configuration item, then some other code should be used.

load-key: In the future we might have more crypto suites (ie new values
for the `encryption` property. Right now trying to load a key on such
a future crypto suite will look up suite parameters off the end of the
crypto table, resulting in misbehaviour and/or crashes (or, with debug
enabled, trip the assertion in `zio_crypt_key_unwrap`).

Instead, lets check the value we got from the dataset, and if we can't
handle it, abort early.

recv: When receiving a raw stream encrypted with an unknown crypto
suite, `zfs recv` would report a generic `invalid backup stream`
(EINVAL). While technically correct, its not super helpful, so lets
ship a more specific error code and message.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14577
2023-03-07 14:05:14 -08:00
George Amanakis 12a240ac0b
Fix a typo in ac2038a
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com> 
Closes #14585
Closes #14592
2023-03-07 13:50:44 -08:00
Rob N 163f3d3a1f
zdb: add decryption support
The approach is straightforward: for dataset ops, if a key was offered,
find the encryption root and the various encryption parameters, derive a
wrapping key if necessary, and then unlock the encryption root. After
that all the regular dataset ops will return unencrypted data, and
that's kinda the whole thing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #11551
Closes #12707
Closes #14503
2023-03-02 13:39:09 -08:00
Rob N ★ ac7648179c
zdb: zero-pad checksum output
The leading zeroes are part of the checksum so we should show them.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14464
2023-02-07 13:48:22 -08:00
George Amanakis ac2038a19c
Teach zdb about DMU_OT_ERROR_LOG objects
With the persistent error log feature we need to account for
spa_errlog_{scrub, last} containing mappings to other error log objects,
which need to be marked as in-use as well.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14442 
Closes #14434
2023-02-02 15:17:37 -08:00
Brian Behlendorf b4cd4fe1aa
Revert "zdb: zdb_ddt_leak_init() reads uninitialized memory..."
This reverts commit d30db519af.  With
this change applied zloop.sh fails reliably with the following ASSERT.

  zio_wait(zio_claim(NULL, zcb->zcb_spa, refcnt ? 0 : spa_min_claim_txg(
    zcb->zcb_spa), bp, NULL, NULL, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0)
  ASSERT at cmd/zdb/zdb.c:5452:zdb_count_block()

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14306
2022-12-21 09:17:00 -08:00
Richard Yao f954ea26a6 zdb: Handle theoretical buffer overflow when printing float
CodeQL pointed out that for extreme floating point values, `sprintf()`
will overwrite a 32 character buffer. It cited 1e304 as an example,
which causes `sprintf()` to print 308 characters.

In practice, the numbers should never exceed 100, so this should not
happen. To silence the warning and also handle unexpected situations, we
change the code to use `snprintf()`.

This was missed during my audit of our use of `sprintf()`, since I did
not think to consider extreme floating point representations. It also
really should not happen, so this change is purely defensive
programming.

This was found by CodeQL's cpp/overrunning-write-with-float check.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:15 -08:00
Richard Yao d30db519af zdb: zdb_ddt_leak_init() reads uninitialized memory when birth == 0
This was written by Jeff Bonick and was committed to OpenSolaris on
November 1, 2009. It appears that Jeff meant to continue the outer loop
iteration when `ddp->ddp_phys_birth == 0`, but put his check inside the
inner loop. This causes a pointer to uninitialized memory to be passed
to ddt_lookup() inside a VERIFY() statement whenever that condition is
true.

Reported-by: Coverity (CID 1524462)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:10 -08:00
Richard Yao ecccaede68 zdb: Fix big parameter passed by value
This is not in performance critical code, but static analyzers will
complain about it, so lets switch to pass by pointer here.

Reported-by: Coverity (CID-1524384)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:52:53 -08:00
Richard Yao 887fb37843 zdb: Silence Coverity complaint about verify_livelist_allocs()
svb is declared on the stack. We then set parts of svb.svb_dva with
DVA_SET_VDEV(), DVA_SET_OFFSET() and DVA_SET_ASIZE(). However, the DVA
contains other fields for pad, GRID and G. When setting the fields we
use, we technically read uninitialized bits  from the fields we do not
use. This makes Coverity and Clang's Static Analyzer complain.
Presumably, other static analyzers might complain too.

There is no real bug here, but we are still technically reading
undefined data and unless we stop doing that, static analyzers will
complain about it in perpetuum and this could obscure real issues. We
silence the static analyzer complaints by using a 0 struct initializer.

Reported by: Coverity (CID 1524627)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 10:00:45 -08:00
Richard Yao 2e08df84d8 Cleanup dump_bookmarks()
Assertions are meant to check assumptions, but the way that this
assertion is written does not check an assumption, since it is provably
always true. Removing the assertion will cause a compiler warning (made
into an error by -Werror) about printing up to 512 bytes to a 256-byte
buffer, so instead, we change the assertion to verify the assumption
that we never do a snprintf() that is truncated to avoid overrunning the
256-byte buffer.

This was caught by an audit of the codebase to look for misuse of
`snprintf()` after CodeQL reported that we had misused `snprintf()`. An
explanation of how snprintf() can be misused is here:

https://www.redhat.com/en/blog/trouble-snprintf

This particular instance did not misuse `snprintf()`, but it was caught
by the audit anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:05:02 -07:00
Richard Yao aa822e4d9c Fix NULL pointer dereference in zdb
Clang's static analyzer complained that we dereference a NULL pointer in
dump_path() if we return 0 when there is an error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:24 -07:00
Tino Reichardt 27218a32fc
Fix declarations of non-global variables
This patch inserts the `static` keyword to non-global variables,
which where found by the analysis tool smatch.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13970
2022-10-18 11:05:32 -07:00
Umer Saleem d9ac17a57f Expose libzutil error info in libpc_handle_t
In libzutil, for zpool_search_import and zpool_find_config, we use
libpc_handle_t internally, which does not maintain error code and it is
not exposed in the interface. Due to this, the error information is not
propagated to the caller. Instead, an error message is printed on
stderr.

This commit adds lpc_error field in libpc_handle_t and exposes it in
the interface, which can be used by the users of libzutil to get the
appropriate error information and handle it accordingly.

Users of the API can also control if they want to print the error
message on stderr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13969
2022-10-04 09:54:35 -07:00
Richard Yao a51288aabb
Fix unsafe string operations
Coverity caught unsafe use of `strcpy()` in `ztest_dmu_objset_own()`,
`nfs_init_tmpfile()` and `dump_snapshot()`. It also caught an unsafe use
of `strlcat()` in `nfs_init_tmpfile()`.

Inspired by this, I did an audit of every single usage of `strcpy()` and
`strcat()` in the code. If I could not prove that the usage was safe, I
changed the code to use either `strlcpy()` or `strlcat()`, depending on
which function was originally used. In some cases, `snprintf()` was used
to replace multiple uses of `strcat` because it was cleaner.

Whenever I changed a function, I preferred to use `sizeof(dst)` when the
compiler is able to provide the string size via that. When it could not
because the string was passed by a caller, I checked the entire call
tree of the function to find out how big the buffer was and hard coded
it. Hardcoding is less than ideal, but it is safe unless someone shrinks
the buffer sizes being passed.

Additionally, Coverity reported three more string related issues:

 * It caught a case where we do an overlapping memory copy in a call to
   `snprintf()`. We fix that via `kmem_strdup()` and `kmem_strfree()`.

 * It caught `sizeof (buf)` being used instead of `buflen` in
   `zdb_nicenum()`'s call to `zfs_nicenum()`, which is passed to
   `snprintf()`. We change that to pass `buflen`.

 * It caught a theoretical unterminated string passed to `strcmp()`.
   This one is likely a false positive, but we have the information
   needed to do this more safely, so we change this to silence the false
   positive not just in coverity, but potentially other static analysis
   tools too. We switch to `strncmp()`.

 * There was a false positive in tests/zfs-tests/cmd/dir_rd_update.c. We
   suppress it by switching to `snprintf()` since other static analysis
   tools might complain about it too. Interestingly, there is a possible
   real bug there too, since it assumes that the passed directory path
   ends with '/'. We add a '/' to fix that potential bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13913
2022-09-27 16:47:24 -07:00
Richard Yao fdc2d30371
Cleanup: Specify unsignedness on things that should not be signed
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.

There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.

Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.

Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.

Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.

The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:

	* metaslab_load_pct
	* zfs_sync_taskq_batch_pct
	* zfs_zil_clean_taskq_nthr_pct
	* zfs_zil_clean_taskq_minalloc
	* zfs_zil_clean_taskq_maxalloc
	* zfs_arc_prune_task_threads

Also, negative values in those parameters was found to be harmless.

The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:

	* zfs_metaslab_switch_threshold
	* zfs_pd_bytes_max
	* zfs_livelist_min_percent_shared

zfs_multihost_history was made static to be consistent with other
parameters.

A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.

Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.

Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:

	* zfs_arc_meta_adjust_restarts
	* zfs_override_estimate_recordsize

They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.

dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.

If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.

This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.

Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models.  As a result, we change the existing parameters
that are uint32_t to use uint_t.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13875
2022-09-27 16:42:41 -07:00
Richard Yao ebe1d03616
Fix userland resource leaks
Coverity caught these. With the exception of the file descriptor leak in
tests/zfs-tests/cmd/draid.c, they are all memory leaks.

Also, there is a piece of dead code in zfs_get_enclosure_sysfs_path().
We delete it as cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13921
2022-09-23 16:55:26 -07:00
Richard Yao 2a493a4c71
Fix unchecked return values and unused return values
Coverity complained about unchecked return values and unused values that
turned out to be unused return values.

Different approaches were used to handle the different cases of
unchecked return values:

* cmd/zdb/zdb.c: VERIFY0 was used in one place since the existing code
  had no error handling. An error message was printed in another to
  match the rest of the code.

* cmd/zed/agents/zfs_retire.c: We dismiss the return value with `(void)`
  because the value is expected to be potentially unset.

* cmd/zpool_influxdb/zpool_influxdb.c: We dismiss the return value with
  `(void)` because the values are expected to be potentially unset.

* cmd/ztest.c: VERIFY0 was used since we want failures if something goes
  wrong in ztest.

* module/zfs/dsl_dir.c: We dismiss the return value with `(void)`
  because there is no guarantee that the zap entry will always be there.
  For example, old pools imported readonly would not have it and we do
  not want to fail here because of that.

* module/zfs/zfs_fm.c: `fnvlist_add_*()` was used since the
  allocations sleep and thus can never fail.

* module/zfs/zvol.c: We dismiss the return value with `(void)` because
  we do not need it. This matches what is already done in the analogous
  `zfs_replay_write2()`.

* tests/zfs-tests/cmd/draid.c: We suppress one return value with
  `(void)` since the code handles errors already. The other return value
  is handled by switching to `fnvlist_lookup_uint8_array()`.

* tests/zfs-tests/cmd/file/file_fadvise.c: We add error handling.

* tests/zfs-tests/cmd/mmap_sync.c: We add error handling for munmap, but
  ignore failures on remove() with (void) since it is expected to be
  able to fail.

* tests/zfs-tests/cmd/mmapwrite.c: We add error handling.

As for unused return values, they were all in places where there was
error handling, so logic was added to handle the return values.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13920
2022-09-23 16:52:03 -07:00
Richard Yao e506a0ce40
Cleanup: Change 1 used in bitshifts to 1ULL
Coverity complains about this. It is not a bug as long as we never shift
by more than 31, but it is not terrible to change the constants from 1
to 1ULL as clean up.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13914
2022-09-22 11:28:33 -07:00
Richard Yao b24d1c77f7
Add zfs_btree_verify_intensity kernel module parameter
I see a few issues in the issue tracker that might be aided by being
able to turn this on. We have no module parameter for it, so I would
like to add one.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13874
2022-09-15 16:22:33 -07:00
Richard Yao 7195c04d98
Fix file descriptor handling in zdb_copy_object()
Coverity found a file descriptor leak. Eyeballing it showed that we had
no handling for the `open()` call failing either. We can address both of
these at once.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13862
2022-09-12 12:34:10 -07:00
Andrew Innes 58e8054bce
Alloc zdb_cd_t to fix stack issue
Alloc zdb_cd_t since it is too large for the stack on windows
which results in `zdb` crashing immediately.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Closes #13807
2022-09-02 13:15:18 -07:00
Christian Schwarz bf61a507a2
zdb: dump spill block pointer if present
Output will look like so:

  $ sudo zdb -dddd -vv testpool/fs 2
  Dataset testpool/fs [ZPL], ID 260, cr_txg 8, 25K, 7 objects, rootbp DVA[0]=<0:1800be00:200> DVA[1]=<0:1c00be00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=16L/16P fill=7 cksum=d03b396cd:489ca835517:d4b04a4d0a62:1b413aac454d53

      Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
           2    1   128K    512     1K     512    512    0.00  ZFS plain file (K=inherit) (Z=inherit=lz4)
                                                 192   bonus  System attributes
      dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
      dnode maxblkid: 0
      path    /testfile
      uid     0
      gid     0
      atime   Fri Jul 15 12:36:35 2022
      mtime   Fri Jul 15 12:36:35 2022
      ctime   Fri Jul 15 12:36:51 2022
      crtime  Fri Jul 15 12:36:35 2022
      gen 10
      mode    100600
      size    0
      parent  34
      links   1
      pflags  840800000004
      SA xattrs: 248 bytes, 2 entries

          security.selinux = nutanix_u:object_r:unlabeled_t:s0\000
          user.foo = xbLQJjyVvEVPGGuRHV/gjkFFO1MdehKnLjjd36ZaoMVaUqtqFoMMYT5Ya9yywHApJNoK/1hNJfO3\012XCJWv9/QUTKamoWW9xVDE7yi8zn166RNw5QUhf84cZ3JNLnw6oN

Spill block: 0:10005c00:200 0:14005c00:200 200L/200P F=1 B=16/16 cksum=1cdfac47a4:910c5caa557:195d0493dfe5a:332b6fde6ad547
Indirect blocks:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13640
2022-07-20 17:16:29 -07:00
Tino Reichardt 1d3ba0bf01
Replace dead opensolaris.org license link
The commit replaces all findings of the link:
http://www.opensolaris.org/os/licensing with this one:
https://opensource.org/licenses/CDDL-1.0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13619
2022-07-11 14:16:13 -07:00
наб dd66857d92 Remaining {=> const} char|void *tag
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб a926aab902 Enable -Wwrite-strings
Also, fix leak from ztest_global_vars_to_zdb_args()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:54 -07:00
Mark Johnston 03df6bad94
zdb: Fix handling of nul termination in symlink targets
The SA attribute containing the symlink target does not include a nul
terminator, so when printing the target zdb would sometimes include
garbage at the end of the string.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13482
2022-05-20 10:32:49 -07:00
наб 510ee280c0 Remove enable_extended_FILE_stdio()
Even on Illumos it's only available in the 32-bit programming
environment, and, quoth enable_extended_FILE_stdio(3C):
> Historically, 32-bit Solaris applications have been limited to using
> only the file descriptors 0 through 255 with the standard I/O
> functions (see stdio(3C)) in the C library. The extended FILE
> facility allows well-behaved 32-bit applications to use any
> valid file descriptor with the standard I/O functions.
where "well-behaved" means that it
> does not directly access any fields in the FILE structure pointed
> to by the FILE pointer associated with any standard I/O stream,

And the stdio/flush.c implementation reads:
  /*
   * if this is not an internal extended FILE then check
   * if _file is being changed from underneath us.
   * It should not be because if
   * it is then then we lose our ability to guard against
   * silent data corruption.
   */
  if (!iop->__xf_nocheck && bad_fd > -1 && iop->_magic != bad_fd) {
      (void) fprintf(stderr,
          "Application violated extended FILE safety mechanism.\n"
          "Please read the man page for extendedFILE.\nAborting\n");
      abort();
  }

This appears to be an insane workaround for broken implementation with
exposed FILE internals and _file being an u8, both only on non-LP64;
it's shimmed out on all LP64 targets in Illumos,
and we shim it out as well: just get rid of it

This appears to've been originally fixed in illumos-gate
a5f69788de7ac07553de47f7fec8c05a9a94c105 ("PSARC 2006/162 Extended FILE
space for 32-bit Solaris processes", "1085341 32-bit stdio routines
should support file descriptors >255"), which also bears extendedFILE
and enable_extended_FILE_stdio(3C):
  -       unsigned char   _file;  /* UNIX System file descriptor */
  +       unsigned char   _magic; /* Old home of the file descriptor */
  +                               /* Only fileno(3C) can retrieve the
  				value now */
and
  +/*
  + * Macros to aid the extended fd FILE work.
  + * This helps isolate the changes to only the 32-bit code
  + * since 64-bit Solaris is not affected by this.
  + */
  +#ifdef  _LP64
  +#define        GET_FD(iop)             ((iop)->_file)
  +#define        SET_FILE(iop, fd)       ((iop)->_file = (fd))
  +#else
  +#define        GET_FD(iop)             \
  +               (((iop)->__extendedfd) ? _file_get(iop) : (iop)->_magic)
  +#define        SET_FILE(iop, fd)       (iop)->_magic = (fd); (iop)->__extendedfd = 0
  +#endif

Also remove the 1k setrlimit(NOFILE) calls: that's the default on Linux,
with 64k on Illumos and 171k on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
2022-05-11 10:33:12 -07:00
наб d5036491e6 zdb: standardise on ctime()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:05 -07:00
наб 861166b027 Remove bcopy(), bzero(), bcmp()
bcopy() has a confusing argument order and is actually a move, not a
copy; they're all deprecated since POSIX.1-2001 and removed in -2008,
and we shim them out to mem*() on Linux anyway

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:42 -07:00
Jorgen Lundman 9a70e97fe1
Rename fallthrough to zfs_fallthrough
Unfortunately macOS has obj-C keyword "fallthrough" in the OS headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #13097
2022-02-15 08:58:59 -08:00
Damian Szuberski 63652e1546
Add `--enable-asan` and `--enable-ubsan` switches
`configure` now accepts `--enable-asan` and `--enable-ubsan` switches
which results in passing `-fsanitize=address`
and `-fsanitize=undefined`, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:

- Memory leak reporting is (temporarily) suppressed. The cost of
  fixing them is relatively high compared to the gains.

- Checksum computing functions in `module/zcommon/zfs_fletcher*`
  have UBSan errors suppressed. It is completely impractical
  to enforce 64-byte payload alignment there due to performance
  impact.

- There's no ASan heap poisoning in `module/zstd/lib/zstd.c`. A custom
  memory allocator is used there rendering that measure
  unfeasible.

- Memory leaks detection has to be suppressed for `cmd/zvol_id`.
  `zvol_id` is run by udev with the help of `ptrace(2)`. Tracing is
  incompatible with memory leaks detection.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12928
2022-02-03 14:35:38 -08:00
наб c70bb2f610 Replace *CTASSERT() with _Static_assert()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:38:52 -08:00
Paul Zuchowski 5a4d282f55
Fix problem with zdb -d
zdb -d <pool>/<objset ID> does not work when
other command line arguments are included i.e.
zdb -U <cachefile> -d <pool>/<objset ID>
This change fixes the command line parsing
to handle this situation.  Also fix issue
where zdb -r <dataset> <file> does not handle
the root <dataset> of the pool. Introduce -N
option to force <objset ID> to be interpreted
as a numeric objsetID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #12845
Closes #12944
2022-01-20 10:28:55 -07:00
Manoj Joseph 6b2e32019e
Long opts for zdb
This change introduces long options for zdb. It updates the usage
message as well to include the long options.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes #12818
2022-01-06 10:54:32 -08:00
наб 6fa118b857 zdb: main: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
Rich Ercolani c5ccbbdcf6
Allow printing special vdev metaslab groups
Sometimes, we'd like to know info about the metaslab groups
on special vdevs too. So let's make -MM do something useful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12750
2021-11-30 10:26:45 -08:00
Fedor Uporov 2a9c572059
zdb: Report bad label checksum
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. From other side zdb with -l option will not
provide any useful information why it happened. Add notifications
about corrupted label checksums.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #2509
Closes #12685
2021-11-10 12:22:00 -07:00
Fedor Uporov e39fe05b69
Skip spacemaps reading in case of pool readonly import
The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9095
Closes #12687
2021-11-09 12:50:39 -08:00
Damian Szuberski 6d680e61ef
Update `checkstyle` workflow env to ubuntu-20.04
- `checkstyle` workflow uses ubuntu-20.04 environment
- improved `mancheck.sh` readability

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12713
2021-11-02 14:02:57 -06:00
Teodor Spæren 01b572bc62
zdb: fix overflow of time estimation
The calculation of estimated time remaining in zdb -cc could overflow,
as reported in #10666. This patch fixes this, by using uint64_t instead
of ints in the calculations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #10666 
Closes #12610
2021-10-15 15:55:34 -07:00