-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash in prepareToFixMemberNames #18236
Conversation
...allowing recursive acquisition of exclusive VM access. This makes it possible to run GC from a critical section that holds safe-point VM access.
Previously it was possible for prepareToFixMemberNames() to come across an ostensible MemberName object that was actually dead, and whose fields could therefore be dangling pointers. Because prepareToFixMemberNames() needs to dereference some of the fields, this could lead to a crash. The crash that has been observed involved a dead MemberName that used to represent a method of a class that had since been unloaded. Calling getJNIMethodID() on the stale J9Method found what would have previously been the correct J9Class, and then crashed for lack of a class loader. Other similar crashes would have been possible as well. For example, if the memory of the J9Method had been reused for something else, we could have crashed in J9_CLASS_FROM_METHOD. With this commit, class redefinition will run GC to eliminate any such dead objects ("dark matter"), ensuring that prepareToFixMemberNames() won't see them.
When GC runs as part of class redefinition, it could unload classes and therefore try to acquire the class unload mutex, which is already held to prevent JIT compilation from running concurrently with redefinition. Such an acquisition attempt could deadlock if it occurs on a different thread, even though the acquisition would be morally recursive. Class redefinition now sets a flag in the J9JavaVM to allow GC to detect this situation and skip acquiring the mutex.
The code for Trc_Assert_BCU_mustHaveExclusiveVMAccess is simply incorrect. Fix it to check the exclusiveCount field. Also move the new assertion in releaseSafePointVMAccess to earlier in the function to detect more error cases. Signed-off-by: Graham Chapman <[email protected]>
jenkins compile xlinux jdk21 |
Thanks! Will try right away tomorrow. The error was reproducible on our Linux x86 pipeline (but not on AIX ppc64), so that build is exactly what I can use for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I never like an idea to use classUnloadMutex
under Exclusive to stop compilation. And now we are going to have an exception on the top of it. Also we assume isClassUnloadMutexHeldForRedefinition
should not be changed during GC (I think it is reasonable assumption).
Ok, we can go with this PR if we don't want to change sync mechanism fundamentally.
jenkins compile aix jdk21 |
I can confirm the fix works in my Jenkins pipeline. Thanks to everyone! 🙌🏻 |
jenkins test sanity alinux64 jdk21 |
any comments @amicic |
looks good, I paid attention to when isClassUnloadMutexHeldForRedefinition was changing I can see that it's while we have exclusive (directly or inderectly through safe point) - so no race with GC querying the flag |
* therefore try to acquire the class unload mutex, which is already held | ||
* to prevent JIT compilation from running concurrently with redefinition. | ||
* Such an acquisition attempt could deadlock if it occurs on a different | ||
* thread, even though the acquisition would be morally recursive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "morally" should be "normally" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An acquisition that occurs on a different thread is technically never a recursive acquisition. However, in this case it's tantamount to one, and the fact that it's occurring on another thread is only a technicality. So I meant morally recursive as opposed to technically recursive
Work by @jdmpapin with a couple of fixes from me.
Fixes: #18076