aboutsummaryrefslogtreecommitdiffstats
path: root/fs/dlm/lockspace.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-23dlm: return -ENOMEM if ls_recover_buf failsAlexander Aring1-1/+3
This patch fixes to return -ENOMEM in case of an allocation failure that was forgotten to change in commit 6c648035cbe7 ("dlm: switch to use rhashtable for rsbs"). Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202404200536.jGi6052v-lkp@intel.com/ Fixes: 6c648035cbe7 ("dlm: switch to use rhashtable for rsbs") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-17dlm: fix sleep in atomic contextAlexander Aring1-1/+1
This patch changes the orphans mutex to a spinlock since commit c288745f1d4a ("dlm: avoid blocking receive at the end of recovery") is using a rwlock_t to lock the DLM message receive path and do_purge() can be called while this lock is held that forbids to sleep. We need to use spin_lock_bh() because also a user context that calls dlm_user_purge() can call do_purge() and since commit 92d59adfaf71 ("dlm: do message processing in softirq context") the DLM message receive path is done under softirq context. Fixes: c288745f1d4a ("dlm: avoid blocking receive at the end of recovery") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/gfs2/9ad928eb-2ece-4ad9-a79c-d2bce228e4bc@moroto.mountain/ Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: use rwlock for lkbidrAlexander Aring1-3/+3
Convert the lock for lkbidr to an rwlock. Most idr lookups will use the read lock. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: use rwlock for rsb hash tableAlexander Aring1-1/+1
The conversion to rhashtable introduced a hash table lock per lockspace, in place of per bucket locks. To make this more scalable, switch to using a rwlock for hash table access. The common case fast path uses it as a read lock. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: drop dlm_scand kthread and use timersAlexander Aring1-91/+14
Currently the scand kthread acts like a garbage collection for expired rsbs on toss list, to clean them up after a certain timeout. It triggers every couple of seconds and iterates over the toss list while holding ls_rsbtbl_lock for the whole hash bucket iteration. To reduce the amount of time holding ls_rsbtbl_lock, we now handle the disposal of expired rsbs using a per-lockspace timer that expires for the earliest tossed rsb on the lockspace toss queue. This toss queue is ordered according to the rsb res_toss_time with the earliest tossed rsb as the first entry. The toss timer will only trylock() necessary locks, since it is low priority garbage collection, and will rearm the timer if trylock() fails. If the timer function does not find any expired rsb's, it rearms the timer with the next earliest expired rsb. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: switch to use rhashtable for rsbsAlexander Aring1-22/+13
Replace our own hash table with the more advanced rhashtable for keeping rsb structs. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: add rsb lists for iterationAlexander Aring1-0/+3
To prepare for using rhashtable, add two rsb lists for iterating through rsb's in two uncommon cases where this is necesssary: - when dumping rsb state from debugfs, now using seq_list. - when looking at all rsb's during recovery. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: merge toss and keep hash table lists into one listAlexander Aring1-10/+3
There are several places where lock processing can perform two hash table lookups, first in the "keep" list, and if not found, in the "toss" list. This patch introduces a new rsb state flag "RSB_TOSS" to represent the difference between the state of being on keep vs toss list, so that the two lists can be combined. This avoids cases of two lookups. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: change to single hashtable lockAlexander Aring1-1/+1
Prepare to replace our own hash table with rhashtable by replacing the per-bucket locks in our own hash table with a single lock. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: increment ls_count for dlm_scandAlexander Aring1-0/+3
Increment the ls_count value while dlm_scand is processing a lockspace so that release_lockspace()/remove_lockspace() will wait for dlm_scand to finish. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: use spin_lock_bh for message processingAlexander Aring1-26/+25
Use spin_lock_bh for all spinlocks involved in message processing, in preparation for softirq message processing. DLM lock requests from user space involve dlm processing in user context, in addition to the standard kernel context, necessitating bh variants. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: convert ls_recv_active from rw_semaphore to rwlockAlexander Aring1-1/+1
Convert ls_recv_active rw_semaphore to an rwlock to avoid sleeping, in preparation for softirq message processing. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: avoid blocking receive at the end of recoveryAlexander Aring1-3/+1
The end of the recovery process transitioned to normal message processing by temporarily blocking the receiving context, processing saved messages, then unblocking the receiving context. To avoid blocking the receiving context, the old wait_queue and mutex are replaced by a new rwlock and the new RECV_MSG_BLOCKED flag. Received messages are added to the list of saved messages, protected by the rwlock, until the flag is cleared, which happens when all saved messages have been processed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: convert ls_waiters_mutex to spinlockAlexander Aring1-1/+1
Convert the waiters mutex to a spinlock in prepration for processing messages in softirq context. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: add new struct to save position in dlm_copy_master_namesAlexander Aring1-0/+2
Add a new struct to save the current position in the rsb masters_list while sending the rsb names to other nodes. The rsb names are sent in multiple chunks, and for each new chunk, the new "dlm_dir_dump" struct saves the last position in the masters_list. The new struct is also used to save more information to sanity check the recovery process. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: move rsb root_list to ls_recover() stackAlexander Aring1-2/+0
Move the rsb root_list from the lockspace to a stack variable since it is now only used by the ls_recover() function. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: use a new list for recovery of master rsb namesAlexander Aring1-0/+2
Add a new "masters_list" for master rsb structs, with a new rwlock. The new list is created and used during the recovery process to send the master rsb names to new nodes. With this change, the current "root_list" can be used without locking. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-06-14fs: dlm: revert check required context while closeAlexander Aring1-12/+0
This patch reverts commit 2c3fa6ae4d52 ("dlm: check required context while close"). The function dlm_midcomms_close(), which will call later dlm_lowcomms_close(), is called when the cluster manager tells the node got fenced which means on midcomms/lowcomms layer to disconnect the node from the cluster communication. The node can rejoin the cluster later. This patch was ensuring no new message were able to be triggered when we are in the close() function context. This was done by checking if the lockspace has been stopped. However there is a missing check that we only need to check specific lockspaces where the fenced node is member of. This is currently complicated because there is no way to easily check if a node is part of a specific lockspace without stopping the recovery. For now we just revert this commit as it is just a check to finding possible leaks of stopping lockspaces before close() is called. Cc: stable@vger.kernel.org Fixes: 2c3fa6ae4d52 ("dlm: check required context while close") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-03-06fs: dlm: move internal flags to atomic opsAlexander Aring1-1/+1
This patch will move the lkb_flags value to the recently introduced lkb_iflags value. For lkb_iflags we use atomic bit operations because some flags like DLM_IFL_CB_PENDING are used while non rsb lock is held to avoid issues with other flag manipulations which might run at the same time we switch to atomic bit operations. Snapshot the bit values to an uint32_t value is only used for debugging/logging use cases and don't need to be 100% correct. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-03-06fs: dlm: rename stub to local message flagAlexander Aring1-2/+2
This patch renames DLM_IFL_STUB_MS to DLM_IFL_LOCAL_MS flag. The DLM_IFL_STUB_MS flag is somewhat misnamed, it means the dlm message is used for local message transfer only. It is used by recovery to resolve lock states if a node got fenced. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-03-06fs: dlm: remove deprecated code partsAlexander Aring1-23/+0
This patch removes code parts which was declared deprecated by commit 6b0afc0cc3e9 ("fs: dlm: don't use deprecated timeout features by default"). This contains the following dlm functionality: - start a cancel of a dlm request did not complete after certain timeout: The current way how dlm cancellation works and interfering with other dlm requests triggered by the user can end in an overlapping and returning in -EBUSY. The most user don't handle this case and are unaware that DLM can return such errno in such situation. Due the timeout the user are mostly unaware when this happens. - start a netlink warning messages for user space if dlm requests did not complete after certain timeout: This feature was never being built in the only known dlm user space side. As we are to remove the timeout cancellation feature we can directly remove this feature as well. There might be the possibility to bring the timeout cancellation feature back. However the current way of handling the -EBUSY case which is only a software limitation and not a hardware limitation should be changed. We minimize the current code base in DLM cancellation feature to not have to deal with those existing features while solving the DLM cancellation feature in general. UAPI define DLM_LSFL_TIMEWARN is commented as deprecated and reserved value. We should avoid at first to give it a new meaning but let possible users still compile by keeping this define. In far future we can give this flag a new meaning. The same for the DLM_LKF_TIMEOUT lock request flag. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-02-24Merge tag 'driver-core-6.3-rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems" [ Geert Uytterhoeven points out that that last sentence isn't true, and that there's a pending report that has a fix that is queued up - Linus ] * tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits) debugfs: drop inline constant formatting for ERR_PTR(-ERROR) OPP: fix error checking in opp_migrate_dentry() debugfs: update comment of debugfs_rename() i3c: fix device.h kernel-doc warnings dma-mapping: no need to pass a bus_type into get_arch_dma_ops() driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place Revert "driver core: add error handling for devtmpfs_create_node()" Revert "devtmpfs: add debug info to handle()" Revert "devtmpfs: remove return value of devtmpfs_delete_node()" driver core: cpu: don't hand-override the uevent bus_type callback. devtmpfs: remove return value of devtmpfs_delete_node() devtmpfs: add debug info to handle() driver core: add error handling for devtmpfs_create_node() driver core: bus: update my copyright notice driver core: bus: add bus_get_dev_root() function driver core: bus: constify bus_unregister() driver core: bus: constify some internal functions driver core: bus: constify bus_get_kset() driver core: bus: constify bus_register/unregister_notifier() driver core: remove private pointer from struct bus_type ...
2023-01-27kobject: kset_uevent_ops: make uevent() callback take a const *Greg Kroah-Hartman1-2/+2
The uevent() callback in struct kset_uevent_ops does not modify the kobject passed into it, so make the pointer const to enforce this restriction. When doing so, fix up all existing uevent() callbacks to have the correct signature to preserve the build. Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230111113018.459199-17-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-23fs: dlm: make dlm sequence id more robustAlexander Aring1-1/+1
When joining a new lockspace, use a random number to initialize a sequence number used in messages. This makes it easier to detect sequence number mismatches in message replies during tests that repeatedly join and leave a lockspace. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: wait until all midcomms nodes detect versionAlexander Aring1-0/+3
The current dlm version detection is very complex due to backwards compatablilty with earlier dlm protocol versions. It takes some time to detect if a peer node has a specific DLM version. If it's not detected, we just cut the socket connection. There could be cases where the local node has not detected the version yet, but the peer node has. In these cases, we are trying to shutdown the dlm connection with a FIN/ACK message exchange to be sure the other peer is ready to shutdown the connection on dlm application level. However this mechanism is only available on DLM protocol version 3.2 and we need to be sure the DLM version is detected before. To make it more robust we introduce a a "best effort" wait to wait for the version detection before shutdown the dlm connection. This need to be done before the kthread recoverd for recovery handling is stopped, because recovery handling will trigger enough messages to have a version detection going on. It is a corner case which was detected by modprobe dlm_locktroture module and rmmod dlm_locktorture module directly afterwards (in a looping behaviour). In practice probably nobody would leave a lockspace immediately after joining it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: start midcomms before scandAlexander Aring1-8/+8
The scand kthread can send dlm messages out, especially dlm remove messages to free memory for unused rsb on other nodes. To send out dlm messages, midcomms must be initialized. This patch moves the midcomms start before scand is started. Cc: stable@vger.kernel.org Fixes: e7fd41792fc0 ("[DLM] The core of the DLM for GFS2/CLVM") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: add midcomms init/start functionsAlexander Aring1-3/+2
This patch introduces leftovers of init, start, stop and exit functionality. The dlm application layer should always call the midcomms layer which getting aware of such event and redirect it to the lowcomms layer. Some functionality which is currently handled inside the start functionality of midcomms and lowcomms should be handled in the init functionality as it only need to be initialized once when dlm is loaded. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: fix log of lowcomms vs midcommsAlexander Aring1-1/+1
This patch will fix a small issue when printing out that dlm_midcomms_start() failed to start and it was printing out that the dlm subcomponent lowcomms was failed but lowcomms is behind the midcomms layer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: remove ls_remove_wait waitqueueAlexander Aring1-3/+0
This patch removes the ls_remove_wait waitqueue handling. The current handling tries to wait before a lookup is send out for a identically resource name which is going to be removed. Hereby the remove message should be send out before the new lookup message. The reason is that after a lookup request and response will actually use the specific remote rsb. A followed remove message would delete the rsb on the remote side but it's still being used. To reach a similar behaviour we simple send the remove message out while the rsb lookup lock is held and the rsb is removed from the toss list. Other find_rsb() calls would never have the change to get a rsb back to live while a remove message will be send out (without holding the lock). This behaviour requires a non-sleepable context which should be provided now and might be the reason why it was not implemented so in the first place. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: convert ls_cb_mutex mutex to spinlockAlexander Aring1-1/+1
This patch converts the ls_cb_mutex mutex to a spinlock, there is no sleepable context when this lock is held. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08dlm: replace one-element array with fixed size arrayPaulo Miguel Almeida1-1/+1
One-element arrays are deprecated. So, replace one-element array with fixed size array member in struct dlm_ls, and refactor the rest of the code, accordingly. Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/228 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 Link: https://lore.kernel.org/lkml/Y0W5jkiXUkpNl4ap@mail.google.com/ Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-23fs: dlm: remove DLM_LSFL_FS from uapiAlexander Aring1-4/+24
The DLM_LSFL_FS flag is set in lockspaces created directly for a kernel user, as opposed to those lockspaces created for user space applications. The user space libdlm allowed this flag to be set for lockspaces created from user space, but then used by a kernel user. No kernel user has ever used this method, so remove the ability to do it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-23fs: dlm: change ls_clear_proc_locks to spinlockAlexander Aring1-1/+1
This patch changes the ls_clear_proc_locks to a spinlock because there is no need to handle it as a mutex as there is no sleepable context when ls_clear_proc_locks is held. This allows us to call those functionality in non-sleepable contexts. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-23fs: dlm: allow lockspaces have zero lvblenAlexander Aring1-1/+1
A dlm user may not use the DLM_LKF_VALBLK flag in the DLM API, so a zero lvblen should be allowed as a lockspace parameter. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01fs: dlm: don't use deprecated timeout features by defaultAlexander Aring1-3/+11
This patch will disable use of deprecated timeout features if CONFIG_DLM_DEPRECATED_API is not set. The deprecated features will be removed in upcoming kernel release v6.2. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-01fs: dlm: add deprecation Kconfig and warnings for timeoutsAlexander Aring1-1/+10
This patch adds a CONFIG_DLM_DEPRECATED_API Kconfig option that must be enabled to use two timeout-related features that we intend to remove in kernel v6.2. Warnings are printed if either is enabled and used. Neither has ever been used as far as we know. . The DLM_LSFL_TIMEWARN lockspace creation flag will be removed, along with the associated configfs entry for setting the timeout. Setting the flag and configfs file would cause dlm to track how long locks were waiting for reply messages. After a timeout, a kernel message would be logged, and a netlink message would be sent to userspace. Recently, midcomms messages have been added that produce much better logging about actual problems with messages. No use has ever been found for the netlink messages. . The userspace libdlm API has allowed the DLM_LKF_TIMEOUT flag with a timeout value to be set in lock requests. The lock request would be cancelled after the timeout. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24fs: dlm: remove waiter warningsAlexander Aring1-1/+0
This patch removes warning messages that could be logged when remote requests had been waiting on a reply message for some timeout period (which could be set through configfs, but was rarely enabled.) The improved midcomms layer now carefully tracks all messages and replies, and logs much more useful messages if there is an actual problem. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24fs: dlm: make new_lockspace() wait until recovery completesAlexander Aring1-4/+5
Make dlm_new_lockspace() wait until a full recovery completes sucessfully or fails. Previously, dlm_new_lockspace() returned to the caller after dlm_recover_members() finished, which is only partially through recovery. The result of the previous behavior is that the new lockspace would not be usable for some time (especially with overlapping recoveries), and some errors in the later part of recovery could not be returned to the caller. Kernel callers gfs2 and cluster-md have their own wait handling to wait for recovery to complete after calling dlm_new_lockspace(). This continues to work, but will be unnecessary. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: check required context while closeAlexander Aring1-0/+12
This patch adds a WARN_ON() check to validate the right context while dlm_midcomms_close() is called. Even before commit 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") in this context dlm_lowcomms_close() flushes all ongoing transmission triggered by dlm application stack. If we do that, it's required that no new message will be triggered by the dlm application stack. The function dlm_midcomms_close() is not called often so we can check if all lockspaces are in such context. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-01-12Merge tag 'driver-core-5.17-rc1' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of changes for the driver core for 5.17-rc1. Lots of little things here, including: - kobj_type cleanups - auxiliary_bus documentation updates - auxiliary_device conversions for some drivers (relevant subsystems all have provided acks for these) - kernfs lock contention reduction for some workloads - other tiny cleanups and changes. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits) kobject documentation: remove default_attrs information drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb debugfs: lockdown: Allow reading debugfs files that are not world readable driver core: Make bus notifiers in right order in really_probe() driver core: Move driver_sysfs_remove() after driver_sysfs_add() firmware: edd: remove empty default_attrs array firmware: dmi-sysfs: use default_groups in kobj_type qemu_fw_cfg: use default_groups in kobj_type firmware: memmap: use default_groups in kobj_type sh: sq: use default_groups in kobj_type headers/uninline: Uninline single-use function: kobject_has_children() devtmpfs: mount with noexec and nosuid driver core: Simplify async probe test code by using ktime_ms_delta() nilfs2: use default_groups in kobj_type kobject: remove kset from struct kset_uevent_ops callbacks driver core: make kobj_type constant. driver core: platform: document registration-failure requirement vdpa/mlx5: Use auxiliary_device driver data helpers net/mlx5e: Use auxiliary_device driver data helpers soundwire: intel: Use auxiliary_device driver data helpers ...
2021-12-28kobject: remove kset from struct kset_uevent_ops callbacksGreg Kroah-Hartman1-2/+1
There is no need to pass the pointer to the kset in the struct kset_uevent_ops callbacks as no one uses it, so just remove that pointer entirely. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-07fs: dlm: use event based wait for pending removeAlexander Aring1-0/+1
This patch will use an event based waitqueue to wait for a possible clash with the ls_remove_name field of dlm_ls instead of doing busy waiting. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: ls_count busy wait to event based waitAlexander Aring1-16/+17
This patch changes the ls_count busy wait to use atomic counter values and wait_event() to wait until ls_count reach zero. It will slightly reduce the number of holding lslist_lock. At remove lockspace we need to retry the wait because it a lockspace get could interefere between wait_event() and holding the lock which deletes the lockspace list entry. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: requestqueue busy wait to event based waitAlexander Aring1-0/+2
This patch changes the requestqueue busy waiting algorithm to use atomic counter values and wait_event() to wait until the requestqueue is empty. It will slightly reduce the number of holding ls_requestqueue_mutex mutex. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: fix small lockspace typoAlexander Aring1-1/+1
This patch fixes a typo from lockspace to lockspace. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-09-01fs: dlm: avoid comms shutdown delay in release_lockspaceAlexander Aring1-0/+1
When dlm_release_lockspace does active shutdown on connections to other nodes, the active shutdown will wait for any exisitng passive shutdowns to be resolved. But, the sequence of operations during dlm_release_lockspace can prevent the normal resolution of passive shutdowns (processed normally by way of lockspace recovery.) This disruption of passive shutdown handling can cause the active shutdown to wait for a full timeout period, delaying the completion of dlm_release_lockspace. To fix this, make dlm_release_lockspace resolve existing passive shutdowns (by calling dlm_clear_members earlier), before it does active shutdowns. The active shutdowns will not find any passive shutdowns to wait for, and will not be delayed. Reported-by: Chris Mackowski <cmackows@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-07-19fs: dlm: use READ_ONCE for config varAlexander Aring1-1/+1
This patch will use READ_ONCE to signal the compiler to read this variable only one time. If we don't do that it could be that the compiler read this value more than one time, because some optimizations, from the configure data which might can be changed during this time. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-06-02fs: dlm: rename socket and app buffer definesAlexander Aring1-1/+1
This patch renames DEFAULT_BUFFER_SIZE to DLM_MAX_SOCKET_BUFSIZE and LOWCOMMS_MAX_TX_BUFFER_LEN to DLM_MAX_APP_BUFSIZE as they are proper names to define what's behind those values. The DLM_MAX_SOCKET_BUFSIZE defines the maximum size of buffer which can be handled on socket layer, the DLM_MAX_APP_BUFSIZE defines the maximum size of buffer which can be handled by the DLM application layer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add reliable connection if reconnectAlexander Aring1-1/+6
This patch introduce to make a tcp lowcomms connection reliable even if reconnects occurs. This is done by an application layer re-transmission handling and sequence numbers in dlm protocols. There are three new dlm commands: DLM_OPTS: This will encapsulate an existing dlm message (and rcom message if they don't have an own application side re-transmission handling). As optional handling additional tlv's (type length fields) can be appended. This can be for example a sequence number field. However because in DLM_OPTS the lockspace field is unused and a sequence number is a mandatory field it isn't made as a tlv and we put the sequence number inside the lockspace id. The possibility to add optional options are still there for future purposes. DLM_ACK: Just a dlm header to acknowledge the receive of a DLM_OPTS message to it's sender. DLM_FIN: This provides a 4 way handshake for connection termination inclusive support for half-closed connections. It's provided on application layer because SCTP doesn't support half-closed sockets, the shutdown() call can interrupted by e.g. TCP resets itself and a hard logic to implement it because the othercon paradigm in lowcomms. The 4-way termination handshake also solve problems to synchronize peer EOF arrival and that the cluster manager removes the peer in the node membership handling of DLM. In some cases messages can be still transmitted in this time and we need to wait for the node membership event. To provide a reliable connection the node will retransmit all unacknowledges message to it's peer on reconnect. The receiver will then filtering out the next received message and drop all messages which are duplicates. As RCOM_STATUS and RCOM_NAMES messages are the first messages which are exchanged and they have they own re-transmission handling, there exists logic that these messages must be first. If these messages arrives we store the dlm version field. This handling is on DLM 3.1 and after this patch 3.2 the same. A backwards compatibility handling has been added which seems to work on tests without tcpkill, however it's not recommended to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long term bugs in the DLM protocol. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25fs: dlm: add more midcomms hooksAlexander Aring1-3/+4
This patch prepares hooks to redirect to the midcomms layer which will be used by the midcomms re-transmit handling. There exists the new concept of stateless buffers allocation and commits. This can be used to bypass the midcomms re-transmit handling. It is used by RCOM_STATUS and RCOM_NAMES messages, because they have their own ping-like re-transmit handling. As well these two messages will be used to determine the DLM version per node, because these two messages are per observation the first messages which are exchanged. Cluster manager events for node membership are added to add support for half-closed connections in cases that the peer connection get to an end of file but DLM still holds membership of the node. In this time DLM can still trigger new message which we should allow. After the cluster manager node removal event occurs it safe to close the connection. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-03-09fs: dlm: add shutdown hookAlexander Aring1-9/+11
This patch fixes issues which occurs when dlm lowcomms synchronize their workqueues but dlm application layer already released the lockspace. In such cases messages like: dlm: gfs2: release_lockspace final free dlm: invalid lockspace 3841231384 from 1 cmd 1 type 11 are printed on the kernel log. This patch is solving this issue by introducing a new "shutdown" hook before calling "stop" hook when the lockspace is going to be released finally. This should pretend any dlm messages sitting in the workqueues during or after lockspace removal. It's necessary to call dlm_scand_stop() as I instrumented dlm_lowcomms_get_buffer() code to report a warning after it's called after dlm_midcomms_shutdown() functionality, see below: WARNING: CPU: 1 PID: 3794 at fs/dlm/midcomms.c:1003 dlm_midcomms_get_buffer+0x167/0x180 Modules linked in: joydev iTCO_wdt intel_pmc_bxt iTCO_vendor_support drm_ttm_helper ttm pcspkr serio_raw i2c_i801 i2c_smbus drm_kms_helper virtio_scsi lpc_ich virtio_balloon virtio_console xhci_pci xhci_pci_renesas cec qemu_fw_cfg drm [last unloaded: qxl] CPU: 1 PID: 3794 Comm: dlm_scand Tainted: G W 5.11.0+ #26 Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 RIP: 0010:dlm_midcomms_get_buffer+0x167/0x180 Code: 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b 45 31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e 41 5f c3 4c 89 e7 45 31 e4 e8 3b f1 ec ff eb 86 <0f> 0b 4c 89 e7 45 31 e4 e8 2c f1 ec ff e9 74 ff ff ff 0f 1f 80 00 RSP: 0018:ffffa81503f8fe60 EFLAGS: 00010202 RAX: 0000000000000008 RBX: ffff8f969827f200 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffffad1e89a0 RDI: ffff8f96a5294160 RBP: 0000000000000001 R08: 0000000000000000 R09: ffff8f96a250bc60 R10: 00000000000045d3 R11: 0000000000000000 R12: ffff8f96a250bc60 R13: ffffa81503f8fec8 R14: 0000000000000070 R15: 0000000000000c40 FS: 0000000000000000(0000) GS:ffff8f96fbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055aa3351c000 CR3: 000000010bf22000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dlm_scan_rsbs+0x420/0x670 ? dlm_uevent+0x20/0x20 dlm_scand+0xbf/0xe0 kthread+0x13a/0x150 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 To synchronize all dlm scand messages we stop it right before shutdown hook. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-11-10fs: dlm: define max send bufferAlexander Aring1-1/+1
This patch will set the maximum transmit buffer size for rcom messages with "names" to 4096 bytes. It's a leftover change of commit 4798cbbfbd00 ("fs: dlm: rework receive handling"). Fact is that we cannot allocate a contiguous transmit buffer length above of 4096 bytes. It seems at some places the upper layer protocol will calculate according to dlm_config.ci_buffer_size the possible payload of a dlm recovery message. As compiler setting we will use now the maximum possible message which dlm can send out. Commit 4e192ee68e5af ("fs: dlm: disallow buffer size below default") disallow a buffer setting smaller than the 4096 bytes and above 4096 bytes is definitely wrong because we will then write out of buffer space as we cannot allocate a contiguous buffer above 4096 bytes. The ci_buffer_size is still there to define the possible maximum receive buffer size of a recvmsg() which should be at least the maximum possible dlm message size. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06dlm: Fix kobject memleakWang Hai1-3/+3
Currently the error return path from kobject_init_and_add() is not followed by a call to kobject_put() - which means we are leaking the kobject. Set do_unreg = 1 before kobject_init_and_add() to ensure that kobject_put() can be called in its error patch. Fixes: 901195ed7f4b ("Kobject: change GFS2 to use kobject_init_and_add") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-05-12dlm: Switch to using wait_event()Ross Lagerwall1-14/+4
We saw an issue in a production server on a customer deployment where DLM 4.0.7 gets "stuck" and unable to join new lockspaces. There is no useful response for the dlm in do_event() if wait_event_interruptible() is interrupted, so switch to wait_event(). Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: David Teigland <teigland@redhat.com>
2019-06-13dlm: Replace default_attrs in dlm_ktype with default_groupsKimberly Brown1-1/+2
The kobj_type default_attrs field is being replaced by the default_groups field, so replace the default_attrs field in dlm_ktype with default_groups. Use the ATTRIBUTE_GROUPS macro to create dlm_groups. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193Thomas Gleixner1-3/+1
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license v 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 45 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-03dlm: NULL check before some freeing functions is not neededThomas Meyer1-4/+2
NULL check before some freeing functions is not needed. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15dlm: fix missing idr_destroy for recover_idrDavid Teigland1-0/+1
Which would leak memory for the idr internals. Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15dlm: fixed memory leaks after failed ls_remove_names allocationVasily Averin1-1/+1
If allocation fails on last elements of array need to free already allocated elements. v2: just move existing out_rsbtbl label to right place Fixes 789924ba635f ("dlm: fix race between remove and lookup") Cc: stable@kernel.org # 3.6 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-07dlm: don't allow zero length namesTycho Andersen1-1/+1
kobject doesn't like zero length object names, so let's test for that. Signed-off-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: David Teigland <teigland@redhat.com>
2018-06-12treewide: Use array_size() in vmalloc()Kees Cook1-1/+1
The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2017-08-07dlm: constify kset_uevent_ops structureBhumika Goyal1-1/+1
Declare kset_uevent_ops structure as const as it is only passed as an argument to the function kset_create_and_add. This argument is of type const, so declare the structure as const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07dlm: print log message when cluster name is not setZhu Lingshan1-0/+4
Print a message when a cluster name is not specified by the caller. In this case the cluster name configured for the dlm is used without any validation that it is the cluster expected by the application. Signed-off-by: Zhu Lingshan <lszhu@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07dlm: Make dismatch error message more clearGang He1-1/+2
This change will try to make this error message more clear, since the upper applications (e.g. ocfs2) invoke dlm_new_lockspace to create a new lockspace with passing a cluster name. Sometimes, dlm_new_lockspace return failure while two cluster names dismatch, the user is a little confused since this line error message is not enough obvious. Signed-off-by: Gang He <ghe@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2016-10-19dlm: audit and remove any unnecessary uses of module.hPaul Gortmaker1-0/+2
Historically a lot of these existed because we did not have a distinction between what was modular code and what was providing support to modules via EXPORT_SYMBOL and friends. That changed when we forked out support for the latter into the export.h file. This means we should be able to reduce the usage of module.h in code that is obj-y Makefile or bool Kconfig. In the case of some code where it is modular, we can extend that to also include files that are building basic support functionality but not related to loading or registering the final module; such files also have no need whatsoever for module.h The advantage in removing such instances is that module.h itself sources about 15 other headers; adding significantly to what we feed cpp, and it can obscure what headers we are effectively using. Since module.h might have been the implicit source for init.h (for __init) and for export.h (for EXPORT_SYMBOL) we consider each instance for the presence of either and replace as needed. In the dlm case, we remove module.h from a global header and only introduce it in the files where it is explicitly required, since there is nothing modular in dlm_internal.h itself. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David Teigland <teigland@redhat.com>
2014-06-06fs/dlm/lockspace.c: convert simple_str to kstrFabian Frederick1-4/+17
Replace obsolete functions. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-14dlm: use INFO for recovery messagesDavid Teigland1-4/+4
The log messages relating to the progress of recovery are minimal and very often useful. Change these to the KERN_INFO level so they are always available. Signed-off-by: David Teigland <teigland@redhat.com>
2013-10-16dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSYBart Van Assche1-3/+1
When dlm_release_lockspace(ls, 1) is invoked on a busy system immediately after the last dlm_unlock() AST has finished it can occur that lkb_idr_is_local() is invoked for the unlocked LKB since removal from ls_lkbidr only occurs after the AST has returned. If that happens dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing the lockspace. Fix this race condition by changing lkb_idr_is_local() such that it only returns true for LKB's that have not yet been unlocked. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-25dlm: log an error for unmanaged lockspacesDavid Teigland1-1/+8
Log an error message if the dlm user daemon exits before all the lockspaces have been removed. Signed-off-by: David Teigland <teigland@redhat.com>
2013-02-27dlm: don't use idr_remove_all()Tejun Heo1-1/+0
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. The conversion isn't completely trivial for recover_idr_clear() as it's the only place in kernel which makes legitimate use of idr_remove_all() w/o idr_destroy(). Replace it with idr_remove() call inside idr_for_each_entry() loop. It goes on top so that it matches the operation order in recover_idr_del(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-08dlm: fix unlock balance warningsDavid Teigland1-3/+12
The in_recovery rw_semaphore has always been acquired and released by different threads by design. To work around the "BUG: bad unlock balance detected!" messages, adjust things so the dlm_recoverd thread always does both down_write and up_write. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: fix race between remove and lookupDavid Teigland1-2/+19
It was possible for a remove message on an old rsb to be sent after a lookup message on a new rsb, where the rsbs were for the same resource name. This could lead to a missing directory entry for the new rsb. It is fixed by keeping a copy of the resource name being removed until after the remove has been sent. A lookup checks if this in-progress remove matches the name it is looking up. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: use idr instead of list for recovered rsbsDavid Teigland1-0/+3
When a large number of resources are being recovered, a linear search of the recover_list takes a long time. Use an idr in place of a list. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: use rsbtbl as resource directoryDavid Teigland1-22/+1
Remove the dir hash table (dirtbl), and use the rsb hash table (rsbtbl) as the resource directory. It has always been an unnecessary duplication of information. This improves efficiency by using a single rsbtbl lookup in many cases where both rsbtbl and dirtbl lookups were needed previously. This eliminates the need to handle cases of rsbtbl and dirtbl being out of sync. In many cases there will be memory savings because the dir hash table no longer exists. Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02dlm: fixes for nodir modeDavid Teigland1-0/+20
The "nodir" mode (statically assign master nodes instead of using the resource directory) has always been highly experimental, and never seriously used. This commit fixes a number of problems, making nodir much more usable. - Major change to recovery: recover all locks and restart all in-progress operations after recovery. In some cases it's not possible to know which in-progess locks to recover, so recover all. (Most require recovery in nodir mode anyway since rehashing changes most master nodes.) - Change the way nodir mode is enabled, from a command line mount arg passed through gfs2, into a sysfs file managed by dlm_controld, consistent with the other config settings. - Allow recovering MSTCPY locks on an rsb that has not yet been turned into a master copy. - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages from a previous, aborted recovery cycle. Base this on the local recovery status not being in the state where any nodes should be sending LOCK messages for the current recovery cycle. - Hold rsb lock around dlm_purge_mstcpy_locks() because it may run concurrently with dlm_recover_master_copy(). - Maintain highbast on process-copy lkb's (in addition to the master as is usual), because the lkb can switch back and forth between being a master and being a process copy as the master node changes in recovery. - When recovering MSTCPY locks, flag rsb's that have non-empty convert or waiting queues for granting at the end of recovery. (Rename flag from LOCKS_PURGED to RECOVER_GRANT and similar for the recovery function, because it's not only resources with purged locks that need grant a grant attempt.) - Replace a couple of unnecessary assertion panics with error messages. Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04dlm: add recovery callbacksDavid Teigland1-8/+35
These new callbacks notify the dlm user about lock recovery. GFS2, and possibly others, need to be aware of when the dlm will be doing lock recovery for a failed lockspace member. In the past, this coordination has been done between dlm and file system daemons in userspace, which then direct their kernel counterparts. These callbacks allow the same coordination directly, and more simply. Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04dlm: add node slots and generationDavid Teigland1-0/+5
Slot numbers are assigned to nodes when they join the lockspace. The slot number chosen is the minimum unused value starting at 1. Once a node is assigned a slot, that slot number will not change while the node remains a lockspace member. If the node leaves and rejoins it can be assigned a new slot number. A new generation number is also added to a lockspace. It is set and incremented during each recovery along with the slot collection/assignment. The slot numbers will be passed to gfs2 which will use them as journal id's. Signed-off-by: David Teigland <teigland@redhat.com>
2011-11-18dlm: convert rsb list to rb_treeBob Peterson1-14/+9
Change the linked lists to rb_tree's in the rsb hash table to speed up searches. Slow rsb searches were having a large impact on gfs2 performance due to the large number of dlm locks gfs2 uses. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-15dlm: use workqueue for callbacksDavid Teigland1-22/+21
Instead of creating our own kthread (dlm_astd) to deliver callbacks for all lockspaces, use a per-lockspace workqueue to deliver the callbacks. This eliminates complications and slowdowns from many lockspaces sharing the same thread. Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-12dlm: improve rsb searchesDavid Teigland1-0/+10
By pre-allocating rsb structs before searching the hash table, they can be inserted immediately. This avoids always having to repeat the search when adding the struct to hash list. This also adds space to the rsb struct for a max resource name, so an rsb allocation can be used by any request. The constant size also allows us to finally use a slab for the rsb structs. Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-11dlm: keep lkbs in idrDavid Teigland1-62/+54
This is simpler and quicker than the hash table, and avoids needing to search the hash list for every new lkid to check if it's used. Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-01dlm: use vmalloc for hash tablesBryn M. Reeves1-9/+9
Allocate dlm hash tables in the vmalloc area to allow a greater maximum size without restructuring of the hash table code. Signed-off-by: Bryn M. Reeves <bmr@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2011-04-01dlm: delayed reply message warningDavid Teigland1-3/+3
Add an option (disabled by default) to print a warning message when a lock has been waiting a configurable amount of time for a reply message from another node. This is mainly for debugging. Signed-off-by: David Teigland <teigland@redhat.com>
2010-03-07Driver core: Constify struct sysfs_ops in struct kobj_typeEmese Revfy1-1/+1
Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-26dlm: Send lockspace name with ueventsSteven Whitehouse1-1/+13
Although it is possible to get this information from the path, its much easier to provide the lockspace as a seperate env variable. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2009-11-30dlm: always use GFP_NOFSDavid Teigland1-10/+5
Replace all GFP_KERNEL and ls_allocation with GFP_NOFS. ls_allocation would be GFP_KERNEL for userland lockspaces and GFP_NOFS for file system lockspaces. It was discovered that any lockspaces on the system can affect all others by triggering memory reclaim in the file system which could in turn call back into the dlm to acquire locks, deadlocking dlm threads that were shared by all lockspaces, like dlm_recv. Signed-off-by: David Teigland <teigland@redhat.com>
2009-05-07dlm: fix use count with multiple joinsDavid Teigland1-7/+6
When a lockspace was joined multiple times, the global dlm use count was incremented when it should not have been. This caused the global dlm threads to not be stopped when all lockspaces were eventually be removed. Signed-off-by: David Teigland <teigland@redhat.com>
2009-05-07dlm: Make name input parameter of {,dlm_}new_lockspace() constGeert Uytterhoeven1-2/+2
| fs/gfs2/lock_dlm.c:207: warning: passing argument 1 of 'dlm_new_lockspace' discards qualifiers from pointer target type Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David Teigland <teigland@redhat.com>
2009-01-28dlm: Change rwlock which is only used in write mode to a spinlockSteven Whitehouse1-1/+1
The ls_dirtbl[].lock was an rwlock, but since it was only used in write mode a spinlock will suffice. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2009-01-08dlm: change rsbtbl rwlock to spinlockDavid Teigland1-1/+1
The rwlock is almost always used in write mode, so there's no reason to not use a spinlock instead. Signed-off-by: David Teigland <teigland@redhat.com>
2008-11-13dlm: fix shutdown cleanupDavid Teigland1-1/+1
Fixes a regression from commit 0f8e0d9a317406612700426fad3efab0b7bbc467, "dlm: allow multiple lockspace creates". An extraneous 'else' slipped into a code fragment being moved from release_lockspace() to dlm_release_lockspace(). The result of the unwanted 'else' is that dlm threads and structures are not stopped and cleaned up when the final dlm lockspace is removed. Trying to create a new lockspace again afterward will fail with "kmem_cache_create: duplicate cache dlm_conn" because the cache was not previously destroyed. Signed-off-by: David Teigland <teigland@redhat.com>
2008-08-28dlm: fix locking of lockspace list in dlm_scandDavid Teigland1-2/+25
The dlm_scand thread needs to lock the list of lockspaces when going through it. Signed-off-by: David Teigland <teigland@redhat.com>
2008-08-28dlm: detect available userspace daemonDavid Teigland1-1/+23
If dlm_controld (the userspace daemon that controls the setup and recovery of the dlm) fails, the kernel should shut down the lockspaces in the kernel rather than leaving them running. This is detected by having dlm_controld hold a misc device open while running, and if the kernel detects a close while the daemon is still needed, it stops the lockspaces in the kernel. Knowing that the userspace daemon isn't running also allows the lockspace create/remove routines to avoid waiting on the daemon for join/leave operations. Signed-off-by: David Teigland <teigland@redhat.com>
2008-08-28dlm: allow multiple lockspace createsDavid Teigland1-37/+70
Add a count for lockspace create and release so that create can be called multiple times to use the lockspace from different places. Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with the previous behavior of returning -EEXIST if the lockspace already exists. Signed-off-by: David Teigland <teigland@redhat.com>
2008-04-30fs: replace remaining __FUNCTION__ occurrencesHarvey Harrison1-1/+1
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06dlm: add __init and __exit marks to init and exit functionsDenis Cheng1-1/+1
it moves 365 bytes from .text to .init.text, and 30 bytes from .text to .exit.text, saves memory. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-29dlm: use dlm prefix on alloc and free functionsDavid Teigland1-4/+4
The dlm functions in memory.c should use the dlm_ prefix. Also, use kzalloc/kfree directly for dlm_direntry's, removing the wrapper functions. Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-29dlm: proper prototypesAdrian Bunk1-8/+0
This patch adds a proper prototype for some functions in fs/dlm/dlm_internal.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-24Kobject: convert fs/* from kobject_unregister() to kobject_put()Greg Kroah-Hartman1-2/+2
There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24Kobject: change GFS2 to use kobject_init_and_addGreg Kroah-Hartman1-22/+4
Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24kobject: convert kernel_kset to be a kobjectGreg Kroah-Hartman1-1/+1
kernel_kset does not need to be a kset, but a much simpler kobject now that we have kobj_attributes. We also rename kernel_kset to kernel_kobj to catch all users of this symbol with a build error instead of an easy-to-ignore build warning. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24kset: convert kernel_subsys to use kset_createGreg Kroah-Hartman1-1/+1
Dynamically create the kset instead of declaring it statically. We also rename kernel_subsys to kernel_kset to catch all users of this symbol with a build error instead of an easy-to-ignore build warning. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24kset: convert dlm to use kset_createGreg Kroah-Hartman1-11/+9
Dynamically create the kset instead of declaring it statically. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24kobject: remove struct kobj_type from struct ksetGreg Kroah-Hartman1-4/+2
We don't need a "default" ktype for a kset. We should set this explicitly every time for each kset. This change is needed so that we can make ksets dynamic, and cleans up one of the odd, undocumented assumption that the kset/kobject/ktype model has. This patch is based on a lot of help from Kay Sievers. Nasty bug in the block code was found by Dave Young <hidave.darkstar@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds1-1/+1
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits) PM: merge device power-management source files sysfs: add copyrights kobject: update the copyrights kset: add some kerneldoc to help describe what these strange things are Driver core: rename ktype_edd and ktype_efivar Driver core: rename ktype_driver Driver core: rename ktype_device Driver core: rename ktype_class driver core: remove subsystem_init() sysfs: move sysfs file poll implementation to sysfs_open_dirent sysfs: implement sysfs_open_dirent sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir sysfs: make sysfs_root a regular directory dirent sysfs: open code sysfs_attach_dentry() sysfs: make s_elem an anonymous union sysfs: make bin attr open get active reference of parent too sysfs: kill unnecessary NULL pointer check in sysfs_release() sysfs: kill unnecessary sysfs_get() in open paths sysfs: reposition sysfs_dirent->s_mode. sysfs: kill sysfs_update_file() ...
2007-10-12Drivers: clean up direct setting of the name of a ksetGreg Kroah-Hartman1-1/+1
A kset should not have its name set directly, so dynamically set the name at runtime. This is needed to remove the static array in the kobject structure which will be changed in a future patch. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-10[DLM] block dlm_recv in recovery transitionDavid Teigland1-0/+1
Introduce a per-lockspace rwsem that's held in read mode by dlm_recv threads while working in the dlm. This allows dlm_recv activity to be suspended when the lockspace transitions to, from and between recovery cycles. The specific bug prompting this change is one where an in-progress recovery cycle is aborted by a new recovery cycle. While dlm_recv was processing a recovery message, the recovery cycle was aborted and dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count on an rsb after dlm_recoverd had reset it to zero. This is fixed by suspending dlm_recv (taking write lock on the rwsem) before aborting the current recovery. The transitions to/from normal and recovery modes are simplified by using this new ability to block dlm_recv. The switch from normal to recovery mode means dlm_recv goes from processing locking messages, to saving them for later, and vice versa. Races are avoided by blocking dlm_recv when setting the flag that switches between modes. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[DLM] don't require FS flag on all nodesDavid Teigland1-3/+4
Mask off the recently added DLM_LSFL_FS flag when setting the exflags. This way all the nodes in the lockspace aren't required to have the FS flag set, since we later check that exflags matches among all nodes. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[DLM] variable allocationPatrick Caulfield1-0/+5
Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace. This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL. (This updated version of the patch uses gfp_t for ls_allocation.) Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-Off-By: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[DLM] wait for config check during join [6/6]David Teigland1-0/+30
Joining the lockspace should wait for the initial round of inter-node config checks to complete before returning. This way, if there's a configuration mismatch between the joining node and the existing nodes, the join can fail and return an error to the application. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[DLM] fix new_lockspace error exit [5/6]David Teigland1-13/+19
Fix the error path when exiting new_lockspace(). It was kfree'ing the lockspace struct at the end, but that's only valid if it exits before kobject_register occured. After kobject_register we have to let the kobject do the freeing. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[DLM] add lock timeouts and warnings [2/6]David Teigland1-1/+9
New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT flag is set, then the request/conversion will be canceled after waiting the specified number of centiseconds (specified per lock). This feature is only available for locks requested through libdlm (can be enabled for kernel dlm users if there's a use for it.) If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then a warning message will be sent to userspace (using genetlink) after a request/conversion has been waiting for a given number of centiseconds (configurable per node). The time warnings will be used in the future to do deadlock detection in userspace. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[DLM] block scand during recovery [1/6]David Teigland1-2/+6
Don't let dlm_scand run during recovery since it may try to do a resource directory removal while the directory nodes are changing. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds1-1/+3
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (34 commits) [GFS2] Uncomment sprintf_symbol calling code [DLM] lowcomms style [GFS2] printk warning fixes [GFS2] Patch to fix mmap of stuffed files [GFS2] use lib/parser for parsing mount options [DLM] Lowcomms nodeid range & initialisation fixes [DLM] Fix dlm_lowcoms_stop hang [DLM] fix mode munging [GFS2] lockdump improvements [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) [DLM] fs/dlm/ast.c should #include "ast.h" [DLM] Consolidate transport protocols [DLM] Remove redundant assignment [GFS2] Fix bz 234168 (ignoring rgrp flags) [DLM] change lkid format [DLM] interface for purge (2/2) [DLM] add orphan purging code (1/2) [DLM] split create_message function [GFS2] Set drop_count to 0 (off) by default ...
2007-05-02remove "struct subsystem" as it is no longer neededGreg Kroah-Hartman1-1/+1
We need to work on cleaning up the relationship between kobjects, ksets and ktypes. The removal of 'struct subsystem' is the first step of this, especially as it is not really needed at all. Thanks to Kay for fixing the bugs in this patch. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-01[DLM] overlapping cancel and unlockDavid Teigland1-1/+3
Full cancel and force-unlock support. In the past, cancel and force-unlock wouldn't work if there was another operation in progress on the lock. Now, both cancel and unlock-force can overlap an operation on a lock, meaning there may be 2 or 3 operations in progress on a lock in parallel. This support is important not only because cancel and force-unlock are explicit operations that an app can use, but both are used implicitly when a process exits while holding locks. Summary of changes: - add-to and remove-from waiters functions were rewritten to handle situations with more than one remote operation outstanding on a lock - validate_unlock_args detects when an overlapping cancel/unlock-force can be sent and when it needs to be delayed until a request/lookup reply is received - processing request/lookup replies detects when cancel/unlock-force occured during the op, and carries out the delayed cancel/unlock-force - manipulation of the "waiters" (remote operation) state of a lock moved under the standard rsb mutex that protects all the other lock state - the two recovery routines related to locks on the waiters list changed according to the way lkb's are now locked before accessing waiters state - waiters recovery detects when lkb's being recovered have overlapping cancel/unlock-force, and may not recover such locks - revert_lock (cancel) returns a value to distinguish cases where it did nothing vs cases where it actually did a cancel; the cancel completion ast should only be done when cancel did something - orphaned locks put on new list so they can be found later for purging - cancel must be called on a lock when making it an orphan - flag user locks (ENDOFLIFE) at the end of their useful life (to the application) so we can return an error for any further cancel/unlock-force - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose either a completion or blocking ast - clear an unread bast on a lock that's become unlocked Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] rename dlm_config_info fieldsDavid Teigland1-5/+5
Add a "ci_" prefix to the fields in the dlm_config_info struct so that we can use macros to add configfs functions to access them (in a later patch). No functional changes in this patch, just naming changes. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] don't accept replies to old recovery messagesDavid Teigland1-0/+2
We often abort a recovery after sending a status request to a remote node. We want to ignore any potential status reply we get from the remote node. If we get one of these unwanted replies, we've often moved on to the next recovery message and incremented the message sequence counter, so the reply will be ignored due to the seq number. In some cases, we've not moved on to the next message so the seq number of the reply we want to ignore is still correct, causing the reply to be accepted. The next recovery message will then mistake this old reply as a new one. To fix this, we add the flag RCOM_WAIT to indicate when we can accept a new reply. We clear this flag if we abort recovery while waiting for a reply. Before the flag is set again (to allow new replies) we know that any old replies will be rejected due to their sequence number. We also initialize the recovery-message sequence number to a random value when a lockspace is first created. This makes it clear when messages are being rejected from an old instance of a lockspace that has since been recreated. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] fix add_requestqueue checking nodes listDavid Teigland1-0/+2
Requests that arrive after recovery has started are saved in the requestqueue and processed after recovery is done. Some of these requests are purged during recovery if they are from nodes that have been removed. We move the purging of the requests (dlm_purge_requestqueue) to later in the recovery sequence which allows the routine saving requests (dlm_add_requestqueue) to avoid filtering out requests by nodeid since the same will be done by the purge. The current code has add_requestqueue filtering by nodeid but doesn't hold any locks when accessing the list of current nodes. This also means that we need to call the purge routine when the lockspace is being shut down since the add routine will not be rejecting requests itself any more. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-06[DLM] fix oops in kref_put when removing a lockspacePatrick Caulfield1-0/+5
Now that the lockspace struct is freed when the last sysfs object is released this patch prevents use of that lockspace by sysfs. We attempt to re-get the lockspace from the lockspace list and fail the request if it has been removed. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-06[DLM] Fix kref_put oopsPatrick Caulfield1-1/+8
This patch fixes the recounting on the lockspace kobject. Previously the lockspace was freed while userspace could have had a reference to one of its sysfs files, causing an oops in kref_put. Now the lockspace kfree is moved into the kobject release() function Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-07[DLM] use snprintf in sysfs showDavid Teigland1-3/+3
Use snprintf(buf, PAGE_SIZE, ...) instead of sprintf in sysfs show methods. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-25[DLM] add new lockspace to list ealierDavid Teigland1-6/+7
When a new lockspace was being created, the recoverd thread was being started for it before the lockspace was added to the global list of lockspaces. The new thread was looking up the lockspace in the global list and sometimes not finding it due to the race with the original thread adding it to the list. We need to add the lockspace to the global list before starting the thread instead of after, and if the new thread can't find the lockspace for some reason, it should return an error. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-09[DLM] show nodeid for recovery messageDavid Teigland1-0/+11
To aid debugging, it's useful to be able to see what nodeid the dlm is waiting on for a message reply. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-07-26[DLM] more info through debugfsDavid Teigland1-1/+2
Display more information from debugfs, particularly locks waiting for a master lookup or operations waiting for a remote reply. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-07-13[DLM] dlm: user locksDavid Teigland1-3/+29
This changes the way the dlm handles user locks. The core dlm is now aware of user locks so they can be dealt with more efficiently. There is no more dlm_device module which previously managed its own duplicate copy of every user lock. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-28[DLM] PATCH 3/3 dlm: show recover stateDavid Teigland1-0/+13
Expose the current recovery state in sysfs to help in debugging. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-20[DLM] Update DLM to the latest patch levelDavid Teigland1-11/+10
Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>
2006-01-18[DLM] The core of the DLM for GFS2/CLVMDavid Teigland1-0/+666
This is the core of the distributed lock manager which is required to use GFS2 as a cluster filesystem. It is also used by CLVM and can be used as a standalone lock manager independantly of either of these two projects. It implements VAX-style locking modes. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>