This commit changes the frontend zio_compress_data and
zio_decompress_data APIs to take ABD points instead of buffer pointers.
All callers are updated to match. Any that already have an appropriate
ABD nearby now use it directly, while at the rest we create an one.
Internally, the ABDs are passed through to the provider directly.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
This commit changes the provider compress and decompress API to take ABD
pointers instead of buffer pointers for both data source and
destination. It then updates all providers to match.
This doesn't actually change the providers to do chunked compression,
just changes the API to allow such an update in the future. Helper
macros are added to easily adapt the ABD functions to their buffer-based
implementations.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
This is updating zstream to use the zio_compress calls rather than using
its own dispatch. Since that was fairly entangled, some refactoring
included.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
When building a static build (--disable-shared), zstream fails to link
because of the duplicate highbit64() in libzpool/kernel.c. Since they're
identical, and the libzpool one is visible to zstream, we remove
zstream's copy and just use the common one.
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#16426
C99 6.7.8.17 says that when an undesignated initialiser is used, only
the first element of a union is initialised. If the first element is not
the largest within the union, how the remaining space is initialised is
up to the compiler.
GCC extends the initialiser to the entire union, while Clang treats the
remainder as padding, and so initialises according to whatever
automatic/implicit initialisation rules are currently active.
When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN,
-ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag
sets the policy for automatic/implicit initialisation of variables on
the stack.
Taken together, this means that when compiling under
CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only
zero the first element in a union, and the rest will be filled with a
pattern. This is significant for aes_ctx_t, which in
aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero,
but then used as a gcm_ctx_t, which is the fifth element in the union,
and thus gets pattern initialisation. Later, it's assumed to be zero,
resulting in a hang.
As confusing and undiscoverable as it is, by the spec, we are at fault
when we initialise a structure containing a union with the zero
initializer. As such, this commit replaces these uses with an explicit
memset(0).
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16135Closes#16206
sscanf returns number of items parsed on success and EOF on failure.
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#16198
In the zstream code, Coverity reported:
"The argument could be controlled by an attacker, who could invoke the
function with arbitrary values (for example, a very high or negative
buffer size)."
It did not report this in the kernel. This is likely because the
userspace code stored this in an int before passing it into the
allocator, while the kernel code stored it in a uint32_t.
However, this did reveal a potentially real problem. On 32-bit systems
and systems with only 4GB of physical memory or less in general, it is
possible to pass a large enough value that the system will hang. Even
worse, on Linux systems, the kernel memory allocator is not able to
support allocations up to the maximum 4GB allocation size that this
allows.
This had already been limited in userspace to 64MB by
`ZFS_SENDRECV_MAX_NVLIST`, but we need a hard limit in the kernel to
protect systems. After some discussion, we settle on 256MB as a hard
upper limit. Attempting to receive a stream that requires more memory
than that will result in E2BIG being returned to user space.
Reported-by: Coverity (CID-1529836)
Reported-by: Coverity (CID-1529837)
Reported-by: Coverity (CID-1529838)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#14285
Coverity has long complained about the checksum being uninitialized if
an END record is processed before its BEGIN record. This should not
happen, but there was no code to check for it. I had left this unfixed
since it was a low priority issue, but then
9f4ede63d2 added another instance of this.
I am making an effort to "hold the line" to keep new coverity defect
reports from going unaddressed, so I find myself forced to fix this much
earlier than I had originally planned to address it.
The solution is to maintain a counter and a flag. Then use VERIFY
statements to verify the following runtime constraints:
* Every record either has a corresponding BEGIN record, is a BEGIN
record or is the end of stream END record for replication streams.
* BEGIN records cannot be nested. i.e. There must be an END record
before another BEGIN record may be seen.
Failure to meet these constraints will cause the program to exit.
This is sufficient to ensure that the checksum is never accessed when
uninitialized.
Reported-by: Coverity (CID 1524578)
Reported-by: Coverity (CID 1524633)
Reported-by: Coverity (CID 1527295)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#14176
As new compression algorithms are added to ZFS, it could be useful for
people to recompress data with new algorithms. There is currently no
mechanism to do this aside from copying the data manually into a new
filesystem with the new algorithm enabled. This tool allows the
transformation to happen through zfs send, allowing it to be done
efficiently to remote systems and in an incremental fashion.
A new zstream command is added that decompresses WRITE records and
then recompresses them with a provided algorithm, and then re-emits
the modified send stream. It may also be possible to re-compress
embedded block pointers, but that was not attempted for the initial
version.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#14106
Cast the integer type to (u)intptr_t before casting to "void *". In
CHERI C/C++ we warn on bare casts from integers to pointers to catch
attempts to create pointers our of thin air. We allow the warning to be
supressed with a suitable cast through (u)intptr_t.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes#14131
This patch inserts the `static` keyword to non-global variables,
which where found by the analysis tool smatch.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes#13970
If a record is uncompressed on-disk but the block pointer insists
otherwise, reading it will return EIO. This commit adds an "off" type
to the "zstream decompress" command. Using it will set the compression
field in a zfs stream to "off" without changing the record's data.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Sponsored by: Axcient
Closes#13997
* `zstream_do_token()` does not handle failures from `libzfs_init()`
* `ztest_global_vars_to_zdb_args()` does not handle failures from
`calloc()`.
* `zfs_snapshot_nvl()` will pass an offset to a NULL pointer as a
source to `strlcpy()` if the provided nvlist is `NULL`.
We handle these by doing what the existing error handling does for other
errors involving these functions.
Coverity complained about these. It had complained about several more,
but one was fixed by 570ca4441e and
another was a false positive. The remaining complaints labelled
"dereferece null return vaue" involve fetching things stored in
in-kernel data structures via `list_head()/list_next()`,
`AVL_PREV()/AVL_NEXT()` and `zfs_btree_find()`. Most of them occur in
void functions that have no error handling. They are much harder to
analyze than the two fixed in this patch, so they are left for a
follow-up patch.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13971
82226e4f44 was intended to prevent a
warning from being printed in situations where it was inappropriate, but
accidentally disabled it entirely by setting featureflags in the wrong
case statement.
Coverity reported this as dead code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13946
It can be used to repair a ZFS file system corrupted by ZFS bug #12762.
Use it like this:
zfs send -c <DS> | \
zstream decompress <OBJECT>,<OFFSET>[,<COMPRESSION_ALGO>] ... | \
zfs recv <DST_DS>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Workaround for #12762Closes#13256
No installation diff, dist lost
-zfs-2.1.99/cmd/fsck_zfs/fsck.zfs
which was distributed erroneously, since it's generated
Also clean gitrev on clean
Also add -e 'any possible bashisms' to default checkbashisms flags,
and fully parallelise it and shellcheck, and it works out-of-tree, too
Also align the Release in the dist META file correctly
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13316
bcopy() has a confusing argument order and is actually a move, not a
copy; they're all deprecated since POSIX.1-2001 and removed in -2008,
and we shim them out to mem*() on Linux anyway
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12996
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#13154
In files created/modified before 4254acb there may be a corruption of
xattrs which is not reported during scrub and normal send/receive. It
manifests only as an error when raw sending/receiving. This happens
because currently only the raw receive path checks for discrepancies
between the dnode bonus length and the spill pointer flag.
In case we encounter a dnode whose bonus length is greater than the
predicted one, we should report an error. Modify in this regard
dnode_sync() with an assertion at the end, dump_dnode() to error out,
dsl_scan_recurse() to report errors during a scrub, and zstream to
report a warning when dumping. Also added a test to verify spill blocks
are sent correctly in a raw send.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#12720Closes#13014
Accidentally introduced by commit dd00925e8d.
Force-install the zstreamdump link, this is a supported configuration
and the install should not fail if it needs to overwrite an existing
file.
Also cd to work around some funny platforms as noted in AC_PROG_LN_S doc
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12143
zstreamdump(8) was in quite a bad state,
and the wrapper didn't work if invoked without /sbin in $PATH
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12015
For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source. The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro. This appears to be an issue with
this version of cppcheck. This commit annotates the source to suppress
the warning.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11700
In order for cppcheck to perform a proper analysis it needs to be
aware of how the sources are compiled (source files, include
paths/files, extra defines, etc). All the needed information is
available from the Makefiles and can be leveraged with a generic
cppcheck Makefile target. So let's add one.
Additional minor changes:
* Removing the cppcheck-suppressions.txt file. With cppcheck 2.3
and these changes it appears to no longer be needed. Some inline
suppressions were also removed since they appear not to be
needed. We can add them back if it turns out they're needed
for older versions of cppcheck.
* Added the ax_count_cpus m4 macro to detect at configure time how
many processors are available in order to run multiple cppcheck
jobs. This value is also now used as a replacement for nproc
when executing the kernel interface checks.
* "PHONY =" line moved in to the Rules.am file which is included
at the top of all Makefile.am's. This is just convenient becase
it allows us to use the += syntax to add phony targets.
* One upside of this integration worth mentioning is it now allows
`make cppcheck` to be run in any directory to check that subtree.
* For the moment, cppcheck is not run against the FreeBSD specific
kernel sources. The cppcheck-FreeBSD target will need to be
implemented and testing on FreeBSD to support this.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11508
Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes#11359
Adding a new subcommand to zstream called token. This
now allows users to decode a resume token to retrieve the toname
field. This can be useful for tools that need this information.
The syntax works as follows zstream token <resume_token>.
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes#10558
libtool stores absolute paths in the dependency_libs component of the
.la files. If the Makefile for a dependent library refers to the
libraries by relative path, some libraries end up duplicated on the link
command line.
As an example, libzfs specifies libzfs_core, libnvpair and libuutil as
dependencies to be linked in. The .la file for libzfs_core also
specifies libnvpair, but using an absolute path, with the result that
libnvpair is present twice in the linker command line for producing
libzfs.
While the only thing this causes is to slightly slow down the linking,
we can avoid it by using absolute paths everywhere, including for
convenience libraries just for consistency.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes#10538
libzutil is currently statically linked into libzfs, libzfs_core and
libzpool. Avoid the unnecessary duplication by removing it from libzfs
and libzpool, and adding libzfs_core to libzpool.
Remove a few unnecessary dependencies:
- libuutil from libzfs_core
- libtirpc from libspl
- keep only libcrypto in libzfs, as we don't use any functions from
libssl
- librt is only used for clock_gettime, however on modern systems that's
in libc rather than librt. Add a configure check to see if we actually
need librt
- libdl from raidz_test
Add a few missing dependencies:
- zlib to libefi and libzfs
- libuuid to zpool, and libuuid and libudev to zed
- libnvpair uses assertions, so add assert.c to provide aok and
libspl_assertf
Sort the LDADD for programs so that libraries that satisfy dependencies
come at the end rather than the beginning of the linker command line.
Revamp the configure tests for libaries to use FIND_SYSTEM_LIBRARY
instead. This can take advantage of pkg-config, and it also avoids
polluting LIBS.
List all the required dependencies in the pkgconfig files, and move the
one for libzfs_core into the latter's directory. Install pkgconfig files
in $(libdir)/pkgconfig on linux and $(prefix)/libdata/pkgconfig on
FreeBSD, instead of /usr/share/pkgconfig, as the more correct location
for library .pc files.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes#10538
This is helpful for determining the size of the nvlist of snapshots
and properties
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#10505
When building on native dev system, there are no issues but when
cross-compiling for target system, some linker errors are observed.
The only way to avoid these errors is by adjusting the Makefile.am
of those various components to add the library dependencies.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Petros Koutoupis <petros@petroskoutoupis.com>
Closes#10304
Fix uninitialized variable in `zstream redup` command. The compiler
may determine the 'stream_offset' variable can be uninitialized
because not all rdt_lookup() exit paths set it. This should never
happen in practice as documented by the assert, but initialize it
regardless to resolve the warning.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10241Closes#10244
* Fix uninitialized variable in `zstream redup` command. The
'rdt.ddt_count' variable is uninitialized because it was
allocated from the stack and not globally. Initialize it.
This was reported by gcc when compiling with debugging enabled.
zstream_redup.c:157:16: error: 'rdt.ddt_count' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
* Remove the cmd/zstreamdump/.gitignore file. It's no longer
needed now that the zstreamdump command is a script.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10192
Deduplicated send and receive is deprecated. To ease migration to the
new dedup-send-less world, the commit adds a `zstream redup` utility to
convert deduplicated send streams to normal streams, so that they can
continue to be received indefinitely.
The new `zstream` command also replaces the functionality of
`zstreamdump`, by way of the `zstream dump` subcommand. The
`zstreamdump` command is replaced by a shell script which invokes
`zstream dump`.
The way that `zstream redup` works under the hood is that as we read the
send stream, we build up a hash table which maps from `<GUID, object,
offset> -> <file_offset>`.
Whenever we see a WRITE record, we add a new entry to the hash table,
which indicates where in the stream file to find the WRITE record for
this block. (The key is `drr_toguid, drr_object, drr_offset`.)
For entries other than WRITE_BYREF, we pass them through unchanged
(except for the running checksum, which is recalculated).
For WRITE_BYREF records, we change them to WRITE records. We find the
referenced WRITE record by looking in the hash table (for the record
with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading
the record header and payload from the specified offset in the stream
file. This is why the stream can not be a pipe. The found WRITE record
replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`,
and `drr_offset` fields changed to be the same as the WRITE_BYREF's
(i.e. we are writing the same logical block, but with the data supplied
by the previous WRITE record).
This algorithm requires memory proportional to the number of WRITE
records (same as `zfs send -D`), but the size per WRITE record is
relatively low (40 bytes, vs. 72 for `zfs send -D`). A 1TB send stream
with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to
"redup".
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#10124Closes#10156