Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash in prepareToFixMemberNames #18236

Merged
merged 4 commits into from
Oct 6, 2023
Merged

Conversation

gacholio
Copy link
Contributor

@gacholio gacholio commented Oct 5, 2023

Work by @jdmpapin with a couple of fixes from me.

Fixes: #18076

...allowing recursive acquisition of exclusive VM access.

This makes it possible to run GC from a critical section that holds
safe-point VM access.
Previously it was possible for prepareToFixMemberNames() to come across
an ostensible MemberName object that was actually dead, and whose fields
could therefore be dangling pointers. Because prepareToFixMemberNames()
needs to dereference some of the fields, this could lead to a crash.

The crash that has been observed involved a dead MemberName that used to
represent a method of a class that had since been unloaded. Calling
getJNIMethodID() on the stale J9Method found what would have previously
been the correct J9Class, and then crashed for lack of a class loader.

Other similar crashes would have been possible as well. For example, if
the memory of the J9Method had been reused for something else, we could
have crashed in J9_CLASS_FROM_METHOD.

With this commit, class redefinition will run GC to eliminate any such
dead objects ("dark matter"), ensuring that prepareToFixMemberNames()
won't see them.
When GC runs as part of class redefinition, it could unload classes and
therefore try to acquire the class unload mutex, which is already held
to prevent JIT compilation from running concurrently with redefinition.
Such an acquisition attempt could deadlock if it occurs on a different
thread, even though the acquisition would be morally recursive.

Class redefinition now sets a flag in the J9JavaVM to allow GC to detect
this situation and skip acquiring the mutex.
The code for Trc_Assert_BCU_mustHaveExclusiveVMAccess is simply
incorrect. Fix it to check the exclusiveCount field.

Also move the new assertion in releaseSafePointVMAccess to earlier in
the function to detect more error cases.

Signed-off-by: Graham Chapman <[email protected]>
@pshipton
Copy link
Member

pshipton commented Oct 5, 2023

jenkins compile xlinux jdk21

@pshipton
Copy link
Member

pshipton commented Oct 5, 2023

@bmarwell I created a jdk21 build with this fix for #18076, if you want to give it a try.

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK21_x86-64_linux_Personal/51/OpenJ9-JDK21-x86-64_linux-20231005-110509.tar.gz

@bmarwell
Copy link

bmarwell commented Oct 5, 2023

Thanks! Will try right away tomorrow. The error was reproducible on our Linux x86 pipeline (but not on AIX ppc64), so that build is exactly what I can use for testing.

Copy link
Contributor

@dmitripivkine dmitripivkine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I never like an idea to use classUnloadMutex under Exclusive to stop compilation. And now we are going to have an exception on the top of it. Also we assume isClassUnloadMutexHeldForRedefinition should not be changed during GC (I think it is reasonable assumption).
Ok, we can go with this PR if we don't want to change sync mechanism fundamentally.

@pshipton
Copy link
Member

pshipton commented Oct 5, 2023

jenkins compile aix jdk21

@pshipton
Copy link
Member

pshipton commented Oct 6, 2023

I did an AIX build as well.
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK21_ppc64_aix_Personal/51/OpenJ9-JDK21-ppc64_aix-20231005-210028.tar.gz

@bmarwell
Copy link

bmarwell commented Oct 6, 2023

I can confirm the fix works in my Jenkins pipeline. Thanks to everyone! 🙌🏻

@tajila
Copy link
Contributor

tajila commented Oct 6, 2023

jenkins test sanity alinux64 jdk21

@tajila
Copy link
Contributor

tajila commented Oct 6, 2023

any comments @amicic

@amicic
Copy link
Contributor

amicic commented Oct 6, 2023

looks good, I paid attention to when isClassUnloadMutexHeldForRedefinition was changing I can see that it's while we have exclusive (directly or inderectly through safe point) - so no race with GC querying the flag

@amicic amicic merged commit ff863a1 into eclipse-openj9:master Oct 6, 2023
7 checks passed
* therefore try to acquire the class unload mutex, which is already held
* to prevent JIT compilation from running concurrently with redefinition.
* Such an acquisition attempt could deadlock if it occurs on a different
* thread, even though the acquisition would be morally recursive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "morally" should be "normally" here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An acquisition that occurs on a different thread is technically never a recursive acquisition. However, in this case it's tantamount to one, and the fact that it's occurring on another thread is only a technicality. So I meant morally recursive as opposed to technically recursive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash in prepareToFixMemberNames()
8 participants