-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ZAP upgrade deadlock #4692
Fix ZAP upgrade deadlock #4692
Conversation
The rw_tryupgrade() implementation the SPL behaves slightly differently than the illumos version. Specifically, the Linux kernel provides no upgrade function so the lock must be released before attempting the upgrade. For many use cases this isn't a problem but when upgrading a ZAP it can result in a deadlock with the l_rwlock. This minor optimization is disabled under Linux to prevent this issue. Signed-off-by: Brian Behlendorf <[email protected]>
@lundman can you please review this. It turns out the
|
@dweeezil could you please review this. It resolves some minor fallout from the The proposed fix is to just remove the optimization here which relies on |
@behlendorf |
@tuxoko I agree that would be best. But I don't see a way to implement this without grubbing around the the private rwsem internals which will be fragile. |
Fragile it might be. But I don't think it would change too much or too often. |
@tuxoko it's complicated by the fact while all the architectures share a common rw_semaphore structure they each have a optimized implementation which can (and does) use it slightly differently. |
@behlendorf As was discussed when It does seem, however, that @tuxoko has created an implementation which will work given the current kernel implementation using cmpxchg exactly as you suggested in openzfs/spl@ef6c136. I suppose the main question is just how fragile it is. I don't know, for example, how it works on the ARM these days or whether a change in implementation might be in the works. |
Closing in favor of openzfs/spl#554 which looks like it's going to be a better solution. |
@behlendorf Hmm we don't have the option of going deeper, as Apple hides all that (unless we implement our own rwlocks I guess). At the moment, we check that readers==1, drop lock, then just tryenter, so it will fail more often than not, but never block. Is there a test case for the zap-upgrade deadlock with |
@lundman it sounds like you're going to have the same issue. If your implementation drops the read lock then it's going to need to reacquire it when the trylock fails and there's where it can deadlock. If you can't change you implementation you'll probably want to adapt this fix. I was able to reproduce this issue fairly easily by creating a new filesystem and then running something like |
Current rw_tryupgrade does rw_exit and then rw_tryenter(RW_RWITER), and then does rw_enter(RW_READER) if it fails. This violate the assumption that rw_tryupgrade should be atomic and could cause extra contention or even lock inversion. This patch we implement a proper rw_tryupgrade. For rwsem-spinlock, we take the spinlock to check rwsem->count and rwsem->wait_list. For normal rwsem, we use cmpxchg on rwsem->count to change the value from single reader to single writer. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes openzfs/zfs#4692 Closes #554
Current rw_tryupgrade does rw_exit and then rw_tryenter(RW_RWITER), and then does rw_enter(RW_READER) if it fails. This violate the assumption that rw_tryupgrade should be atomic and could cause extra contention or even lock inversion. This patch we implement a proper rw_tryupgrade. For rwsem-spinlock, we take the spinlock to check rwsem->count and rwsem->wait_list. For normal rwsem, we use cmpxchg on rwsem->count to change the value from single reader to single writer. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes openzfs/zfs#4692 Closes openzfs#554
Current rw_tryupgrade does rw_exit and then rw_tryenter(RW_RWITER), and then does rw_enter(RW_READER) if it fails. This violate the assumption that rw_tryupgrade should be atomic and could cause extra contention or even lock inversion. This patch we implement a proper rw_tryupgrade. For rwsem-spinlock, we take the spinlock to check rwsem->count and rwsem->wait_list. For normal rwsem, we use cmpxchg on rwsem->count to change the value from single reader to single writer. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes openzfs/zfs#4692 Closes openzfs#554
The rw_tryupgrade() implementation the SPL behaves slightly differently
than the illumos version. Specifically, the Linux kernel provides no
upgrade function so the lock must be released before attempting the
upgrade. For many use cases this isn't a problem but when upgrading
a ZAP it can result in a deadlock with the l_rwlock.
This minor optimization is disabled under Linux to prevent this issue.
Signed-off-by: Brian Behlendorf [email protected]