-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in BlockingDictionary #175
Comments
I'm able to remove the deadlock with the following change:
|
I'm evaluating whether other changes may be more appropriate, such as not holding |
@watfordgnf , really appreciate you taking a look at this. Thanks! |
- pubAckMap's reference cannot change under mu in StanConnection
…s-io#175 - This resolves a deadlock seen with unexpected connection failures
Wow that was quick, thanks :) We solved it temporarily by replacing the wait for space with a simple sleep/retry loop while (!pubAckMap.TryAdd(guidValue, a))
{
var bd = pubAckMap;
Monitor.Exit(mu);
// Wait for space outside of the lock so
// acks can be removed.
// bd.waitForSpace();
Thread.Sleep(10);
Monitor.Enter(mu);
if (nc == null)
{
throw new StanConnectionClosedException();
}
} It will not affect performance since this only happens when we are writing to fast anyway. Looking forward to trying out your changes. |
@johnsusi I applied a very similar fix as well, such that it will not wait for space in the pubAckMap any longer than the ping interval, which fixes a second deadlock seen with unexpected connection failures. I removed the deadlock you saw by ensuring Another interesting bit is |
Very nice @watfordgnf, seems my issue is resolved at least on my simple example. Will try out with our real system and see if any problems arise and report back here after. |
@johnsusi, would love the feedback - much appreciated! |
Everything seems to be working now so big thumbs up here. |
Thanks for checking @johnsusi! |
First a big thanks for all your hard work. It is very appreciated!
We are having some issues in a system which publishes a lot of data (lots of small messages < 1k).
I have narrowed it down to multithreading and BlockingCollection. My guess is that the issue arises from Remove and waitForSpace takes the locks in different order.
Here is a small test that, on my computer, hits the deadlock every time. But timing is probably a factor here.
Server was run using
nats-streaming-server -p 4222 -st file -dir $PWD/data
on a macbook pro and on a windows 10 pc.
I tried looking into fixing the problem myself but still no progress.
The text was updated successfully, but these errors were encountered: