Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Brian Behlendorf	b40a77aefc	Add the release component to headers When the original build system code was added the release component was accidentally omited from the development header install path. This patch adds the missing path component so it's always clear exactly what release your compiling against. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 12:19:47 -08:00
Brian Behlendorf	166dd49de0	Linux 3.2 compat, security_inode_init_security() The security_inode_init_security() API has been changed to include a filesystem specific callback to write security extended attributes. This was done to support the initialization of multiple LSM xattrs and the EVM xattr. This change updates the code to use the new API when it's available. Otherwise it falls back to the previous implementation. In addition, the ZFS_AC_KERNEL_6ARGS_SECURITY_INODE_INIT_SECURITY autoconf test has been made more rigerous by passing the expected types. This is done to ensure we always properly the detect the correct form for the security_inode_init_security() API. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #516	2012-01-12 15:06:39 -08:00
Brian Behlendorf	ab26409db7	Linux 3.1 compat, super_block->s_shrink The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #466 Closes #292	2012-01-11 11:46:02 -08:00
Darik Horn	28eb9213d8	Linux 3.2 compat: set_nlink() Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170 Use the new set_nlink() kernel function instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #462	2011-12-16 20:02:52 -08:00
Brian Behlendorf	adcd70bd1a	Linux 3.1 compat, fops->fsync() The Linux 3.1 kernel updated the fops->fsync() callback yet again. They now pass the requested range and delegate the responsibility for calling filemap_write_and_wait_range() to the callback. In addition imutex is no longer held by the caller and the callback is responsible for taking the lock if required. This commit updates the code to provide a zpl_fsync() function for the updated API. Implementations for the previous two APIs are also maintained for compatibility. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #445	2011-11-10 10:03:08 -08:00
Brian Behlendorf	5547c2f1bf	Simplify BDI integration Update the code to use the bdi_setup_and_register() helper to simplify the bdi integration code. The updated code now just registers the bdi during mount and destroys it during unmount. The only complication is that for 2.6.32 - 2.6.33 kernels the helper wasn't available so in these cases the zfs code must provide it. Luckily the bdi_setup_and_register() function is trivial. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #367	2011-11-08 10:19:03 -08:00
Prakash Surya	8366cd6a83	Convert 'if' statements to AS_IF in kernel.m4 The 'if' statements found in kernel.m4 were converted to use the portable alternative provided by autoconf, the AS_IF macro. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-09-06 13:20:48 -07:00
Prakash Surya	2984e0bb0c	Fix minor autoconf error message inconsistencies A few of the autoconf error messages were inconsistent with the rest of the build system. To be specific, the inconsistencies addressed by this commit are the following: * The second line of the error message for the CONFIG_PREEMPT check was missing it's third asterisk. * A few of the error messages were prefixed by two tabs, whereas the majority of error messages are only prefixed by a single tab. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-09-06 13:20:04 -07:00
Brian Behlendorf	76659dc110	Add backing_device_info per-filesystem For a long time now the kernel has been moving away from using the pdflush daemon to write 'old' dirty pages to disk. The primary reason for this is because the pdflush daemon is single threaded and can be a limiting factor for performance. Since pdflush sequentially walks the dirty inode list for each super block any delay in processing can slow down dirty page writeback for all filesystems. The replacement for pdflush is called bdi (backing device info). The bdi system involves creating a per-filesystem control structure each with its own private sets of queues to manage writeback. The advantage is greater parallelism which improves performance and prevents a single filesystem from slowing writeback to the others. For a long time both systems co-existed in the kernel so it wasn't strictly required to implement the bdi scheme. However, as of Linux 2.6.36 kernels the pdflush functionality has been retired. Since ZFS already bypasses the page cache for most I/O this is only an issue for mmap(2) writes which must go through the page cache. Even then adding this missing support for newer kernels was overlooked because there are other mechanisms which can trigger writeback. However, there is one critical case where not implementing the bdi functionality can cause problems. If an application handles a page fault it can enter the balance_dirty_pages() callpath. This will result in the application hanging until the number of dirty pages in the system drops below the dirty ratio. Without a registered backing_device_info for the filesystem the dirty pages will not get written out. Thus the application will hang. As mentioned above this was less of an issue with older kernels because pdflush would eventually write out the dirty pages. This change adds a backing_device_info structure to the zfs_sb_t which is already allocated per-super block. It is then registered when the filesystem mounted and unregistered on unmount. It will not be registered for mounted snapshots which are read-only. This change will result in flush-<pool> thread being dynamically created and destroyed per-mounted filesystem for writeback. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #174	2011-08-04 13:37:38 -07:00
Brian Behlendorf	0da7869690	Fix the configure CONFIG_* option detection The latest kernels no longer define AUTOCONF_INCLUDED which was being used to detect the new style autoconf.h kernel configure options. This results in the CONFIG_* checks always failing incorrectly for newer kernels. The fix for this is a simplification of the testing method. Rather than attempting to explicitly include to renamed config header. It is simpler to unconditionally include <linux/module.h> which must pick up the correctly named header. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #320	2011-07-22 15:07:16 -07:00
Brian Behlendorf	2cf7f52bc4	Linux compat 2.6.39: mount_nodev() The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231	2011-07-01 13:36:39 -07:00
Brian Behlendorf	5c03efc379	Linux compat 2.6.39: security_inode_init_security() The security_inode_init_security() function now takes an additional qstr argument which must be passed in from the dentry if available. Passing a NULL is safe when no qstr is available the relevant security checks will just be skipped. Closes #246 Closes #217 Closes #187	2011-07-01 12:40:08 -07:00
Prasad Joshi	b312979252	Tear down and flush the mmap region The inode eviction should unmap the pages associated with the inode. These pages should also be flushed to disk to avoid the data loss. Therefore, use truncate_setsize() in evict_inode() to release the pagecache. The API truncate_setsize() was added in 2.6.35 kernel. To ensure compatibility with the old kernel, the patch defines its own truncate_setsize function. Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com> Closes #255	2011-06-27 09:59:19 -07:00
Brian Behlendorf	8a7e1ceefa	Check for -Wno-unused-but-set-variable gcc support Gcc versions 4.3.2 and earlier do not support the compiler flag -Wno-unused-but-set-variable. This can lead to build failures on older Linux platforms such as Debian Lenny. Since this is an optional build argument this changes add a new autoconf check for the option. If it is supported by the installed version of gcc then it is used otherwise it is omited. See commit's `12c1acde76` and `79713039a2` for the reason the -Wno-unused-but-set-variable options was originally added.	2011-06-14 14:43:22 -07:00
Brian Behlendorf	df554c148e	Fix 'zfs set volsize=N pool/dataset' This change fixes a kernel panic which would occur when resizing a dataset which was not open. The objset_t stored in the zvol_state_t will be set to NULL when the block device is closed. To avoid this issue we pass the correct objset_t as the third arg. The code has also been updated to correctly notify the kernel when the block device capacity changes. For 2.6.28 and newer kernels the capacity change will be immediately detected. For earlier kernels the capacity change will be detected when the device is next opened. This is a known limitation of older kernels. Online ext3 resize test case passes on 2.6.28+ kernels: $ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023 $ zpool create tank /tmp/zvol $ zfs create -V 500M tank/zd0 $ mkfs.ext3 /dev/zd0 $ mkdir /mnt/zd0 $ mount /dev/zd0 /mnt/zd0 $ df -h /mnt/zd0 $ zfs set volsize=800M tank/zd0 $ resize2fs /dev/zd0 $ df -h /mnt/zd0 Original-patch-by: Fajar A. Nugraha <github@fajar.net> Closes #68 Closes #84	2011-05-02 08:54:40 -07:00
Gunnar Beutner	055656d4f4	Implemented NFS export_operations. Implemented the required NFS operations for exporting ZFS datasets using the in-kernel NFS daemon.	2011-04-29 12:36:13 -07:00
Brian Behlendorf	12c1acde76	Set -Wno-unused-but-set-variable globally As of gcc-4.6 the option -Wunused-but-set-variable is enabled by default. While this is a useful warning there are numerous places in the ZFS code when a variable is set and then only checked in an ASSERT(). To avoid having to update every instance of this in the code we now set -Wno-unused-but-set-variable to suppress the warning. Additionally, when building with --enable-debug and -Werror set these warning also become fatal. We can reevaluate the suppression of these error at a later time if it becomes an issue. For now we are basically just reverting to the previous gcc behavior.	2011-04-19 10:44:10 -07:00
Brian Behlendorf	bdf4328b04	Linux 2.6.28 compat, insert_inode_locked() Added insert_inode_locked() helper function, prior to this most callers used insert_inode_hash(). The older method doesn't check for collisions in the inode_hashtable but it still acceptible for use. Fallback to using insert_inode_hash() when insert_inode_locked() is unavailable.	2011-03-22 12:15:54 -07:00
Brian Behlendorf	a60b1c0a8e	Make Missing Modules.symvers Fatal Detect early on in configure if the Modules.symvers file is missing. Without this file there will be build failures later and it's best to catch this early and provide a useful error. In this case the most likely problem is the kernel-devel packages are not installed. It may also be possible that they are using an unbuilt custom kernel in which case they must build the kernel first. Closes #127	2011-03-07 13:09:20 -08:00
Brian Behlendorf	15805c7711	Make CONFIG_PREEMPT Fatal Until support is added for preemptible kernels detect this at configure time and make it fatal. Otherwise, it is possible to have a successful build and kernel modules with flakey behavior.	2011-03-07 12:09:02 -08:00
Brian Behlendorf	45066d1f20	Linux 2.6.38 compat, blkdev_get_by_path() The open_bdev_exclusive() function has been replaced (again) by the more generic blkdev_get_by_path() function. Additionally, the counterpart function close_bdev_exclusive() has been replaced by blkdev_put(). Because these functions are more generic versions of the functions they replaced the compatibility macro must add the FMODE_EXCL mask to ensure they are exclusive. Closes #114	2011-02-23 12:29:38 -08:00
Brian Behlendorf	2c395def27	Linux 2.6.36 compat, sops->evict_inode() The new prefered inteface for evicting an inode from the inode cache is the ->evict_inode() callback. It replaces both the ->delete_inode() and ->clear_inode() callbacks which were previously used for this.	2011-02-11 13:47:51 -08:00
Brian Behlendorf	f9637c6c8b	Linux 2.6.33 compat, get/set xattr callbacks The xattr handler prototypes were sanitized with the idea being that the same handlers could be used for multiple methods. The result of this was the inode type was changes to a dentry, and both the get() and set() hooks had a handler_flags argument added. The list() callback was similiarly effected but no autoconf check was added because we do not use the list() callback.	2011-02-11 10:41:00 -08:00
Brian Behlendorf	7268e1bec8	Linux 2.6.35 compat, fops->fsync() The fsync() callback in the file_operations structure used to take 3 arguments. The callback now only takes 2 arguments because the dentry argument was determined to be unused by all consumers. To handle this a compatibility prototype was added to ensure the right prototype is used. Our implementation never used the dentry argument either so it's just a matter of using the right prototype.	2011-02-11 09:05:51 -08:00
Brian Behlendorf	777d4af891	Linux 2.6.35 compat, const struct xattr_handler The const keyword was added to the 'struct xattr_handler' in the generic Linux super_block structure. To handle this we define an appropriate xattr_handler_t typedef which can be used. This was the preferred solution because it keeps the code clean and readable.	2011-02-10 16:29:00 -08:00
Brian Behlendorf	1b94c25ceb	Prefer /lib/modules/$(uname -r)/ links Preferentially use the /lib/modules/$(uname -r)/source and /lib/modules/$(uname -r)/build links. Only if neither of these links exist fallback to alternate methods for deducing which kernel to build with. This resolves the need to manually specify --with-linux= and --with-linux-obj= on Debian systems.	2011-02-10 14:54:33 -08:00
Brian Behlendorf	675de5aa37	Linux 2.6.36 compat, synchronous bio flag The name of the flag used to mark a bio as synchronous has changed again in the 2.6.36 kernel due to the unification of the BIO_RW_* and REQ_* flags. The new flag is called REQ_SYNC. To simplify checking this flag I have introduced the vdev_disk_dio_is_sync() helper function. Based on the results of several new autoconf tests it uses the correct mask to check for a synchronous bio. Preferred interface for flagging a synchronous bio: 2.6.12-2.6.29: BIO_RW_SYNC 2.6.30-2.6.35: BIO_RW_SYNCIO 2.6.36-2.6.xx: REQ_SYNC	2010-11-10 17:00:33 -08:00
Brian Behlendorf	f4af6bb783	Linux 2.6.36 compat, use REQ_FAILFAST_MASK As of linux-2.6.36 the BIO_RW_FAILFAST and REQ_FAILFAST flags have been unified under the REQ_* names. These flags always had to be kept in-sync so this is a nice step forward, unfortunately it means we need to be careful to only use the new unified flags when the BIO_RW_* flags are not defined. Additional autoconf checks were added for this and if it is ever unclear which method to use no flags are set. This is safe but may result in longer delays before a disk is failed. Perferred interface for setting FAILFAST on a bio: 2.6.12-2.6.27: BIO_RW_FAILFAST 2.6.28-2.6.35: BIO_RW_FAILFAST_{DEV\|TRANSPORT\|DRIVER} 2.6.36-2.6.xx: REQ_FAILFAST_{DEV\|TRANSPORT\|DRIVER}	2010-11-10 16:59:49 -08:00
Brian Behlendorf	2959d94a0a	Add FAILFAST support ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.	2010-10-12 14:55:02 -07:00
Brian Behlendorf	6283f55ea1	Support custom build directories and move includes One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense.	2010-09-08 12:38:56 -07:00
Brian Behlendorf	5e6121455c	Fix spl version check The spl_config.h file is checked to determine the spl version. However, the zfs code was looking for it in the source directory and not the build directory.	2010-09-02 20:44:41 -07:00
Brian Behlendorf	c9c0d073da	Add build system Add autoconf style build infrastructure to the ZFS tree. This includes autogen.sh, configure.ac, m4 macros, some scripts/*, and makefiles for all the core ZFS components.	2010-08-31 13:41:27 -07:00
Brian Behlendorf	42baae9615	Removed build system from master branch, will relocate to linux-zfs-branch	2008-12-01 15:38:41 -08:00
Brian Behlendorf	62b749c8c8	Working version of M4 macro config	2008-11-26 15:32:39 -08:00
Brian Behlendorf	f0e648ca02	Make everything a M4 macro, it's just cleaner that way	2008-11-26 14:29:45 -08:00

1 2 3 4 5

235 Commits