[subinterpreters] Lingering subinterpreters should be implicitly cleared on shutdown #80406

ncoghlan · 2019-03-07T13:17:44Z

BPO	36225
Nosy	@ncoghlan, @vstinner, @ned-deily, @encukou, @ericsnowcurrently, @soltysh, @JulienPalard, @nanjekyejoannah, @LewisGaul, @NasaGeek
PRs	gh-80406: Finalise subinterpreters in Py_FinalizeEx() #17575
Files	finalise-subinterps-test.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-03-07.13:17:43.665>
labels = ['expert-subinterpreters', '3.10', 'type-crash']
title = '[subinterpreters] Lingering subinterpreters should be implicitly cleared on shutdown'
updated_at = <Date 2021-12-16.23:53:44.369>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2021-12-16.23:53:44.369>
actor = 'nasageek'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Subinterpreters']
creation = <Date 2019-03-07.13:17:43.665>
creator = 'ncoghlan'
dependencies = []
files = ['48733']
hgrepos = []
issue_num = 36225
keywords = ['patch']
message_count = 17.0
messages = ['337392', '337743', '337852', '339149', '339150', '339191', '345802', '349119', '349124', '353873', '357193', '357235', '364587', '371338', '371571', '379366', '379718']
nosy_count = 10.0
nosy_names = ['ncoghlan', 'vstinner', 'ned.deily', 'petr.viktorin', 'eric.snow', 'maciej.szulik', 'mdk', 'nanjekyejoannah', 'LewisGaul', 'nasageek']
pr_nums = ['17575']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue36225'
versions = ['Python 3.10']

ncoghlan · 2019-03-07T13:17:44Z

https://docs.python.org/3/c-api/init.html#c.Py_EndInterpreter states that "Py_FinalizeEx() will destroy all sub-interpreters that haven’t been explicitly destroyed at that point."

As discussed in hexchat/hexchat#2237, Python 3.7+ doesn't currently do that - it calls Py_FatalError instead.

That change came from #1728, which was based on my initial PEP-432 refactoring work, and I didn't realise that implicitly cleaning up lingering subinterpreters was a documented behaviour.

So I think we should just fix it to behave as documented, and add a new regression test to make sure it doesn't get broken again in the future.

nanjekyejoannah · 2019-03-12T14:24:25Z

I have been wondering where the regression to test this can be put..in test__xxsubinterpreters.py may be?

encukou · 2019-03-13T14:47:17Z

Joannah, yes, that looks like a good place. Eric Snow might have more info; he wrote that module.

As for testing Py_FatalError, there's an assert_python_failure function in test.support.script_helper.

ericsnowcurrently · 2019-03-29T21:02:51Z

Interestingly, I noticed this independently today. :)

Here's what I wrote in bpo-36477 (which I've closed as a duplicate):

When using subinterpreters, any that exist when Py_FinalizeEx() is called do not appear to get cleaned up during runtime finalization. Maybe I've been looking at the code too much and I'm missing something. :)

This really isn't a problem except for embedders that use subinterpreters (where we're leaking memory). However, even with the "python" executable it can have an impact because the subinterpreters' non-daemon threads will exit later than expected. (see bpo-36469 & bpo-36476)

The solution would be to finalize all subinterpreters at the beginning of Py_FinalizeEx(), right before the call to wait_for_thread_shutdown(). This means calling Py_EndInterpreter() for all the runtime's interpreters (except the main one). It would also mean setting a flag (_PyRuntime.interpreters.finalizing?) right before that to disallow creation of any more subinterptreters.

ericsnowcurrently · 2019-03-29T21:07:25Z

test__xxsubinterpreters is a great place for tests that exercise use of subinterpreters, including most lifecycle operations. There are also one or two subinterpreter-related tests in test_embed. However, for this issue the interplay with runtime finalization means tests should probably stay with other tests that exercise Py_FinalizeEx().

ncoghlan · 2019-03-30T12:09:08Z

I think test_embed would be the right home for this, as there's an existing test case there for subinterpreter lifecycles and repeated init/finalize cycles:

cpython/Programs/_testembed.c

Line 43 in ddbb978

static int test_repeated_init_and_subinterpreters(void)

The test case here would be similar, but it wouldn't need the outer loop - it would just create a handful of subinterpreters, but instead of ending each one before creating the next one the way the existing test does, what it would instead do is:

setup as per the existing test case
create a pair of subinterpeters, using a copy of the existing loop, but omitting the Py_EndInterpreter call
switch back to the main interpreter
create a second pair of subinterpeters
switch back to the main interpreter
call Py_Finalize

It also occurs to me that we don't currently have a test case for what happens if you call Py_Finalize from a subinterpreter rather than the main interpreter.

ned-deily · 2019-06-17T07:24:21Z

Ping. It was marked as a 3.7regression but perhaps it should now be just targeted for 3.8.

nanjekyejoannah · 2019-08-06T17:25:05Z

I am investigating this but in the meantime.

It also occurs to me that we don't currently have a test case for what happens if you call Py_Finalize from a sub-interpreter rather than the main interpreter.

@ncoghlan Am moving this test request in a new issue. So that this issue only focuses on fixing the lingering sub-interpreters.

nanjekyejoannah · 2019-08-06T17:44:26Z

The test request is moved to bpo-37776.

nanjekyejoannah · 2019-10-03T18:51:00Z

I remember julien wanting to check this out during a discussion we had at the sprints hence the loop in.

LewisGaul · 2019-11-21T18:31:13Z

I've put together a test along the lines of what Nick suggested, see the attached patch. Running this hits the Fatal 'remaining subinterpreters' error as expected:

   > ./Programs/_testembed test_bpo36225
--- Pass 0 ---
interp 0 <0x1A561A0>, thread state <0x1A56DD0>: id(modules) = 139707107673104
interp 1 <0x1A9D050>, thread state <0x1A87620>: id(modules) = 139707106987472
interp 2 <0x1B02210>, thread state <0x1AEC320>: id(modules) = 139706981531088
interp 0 <0x1A561A0>, thread state <0x1A56DD0>: id(modules) = 139707107673104
interp 3 <0x1B53740>, thread state <0x1AFD980>: id(modules) = 139706980408304
interp 4 <0x1BA3390>, thread state <0x1B3C7A0>: id(modules) = 139706979780944
interp 0 <0x1A561A0>, thread state <0x1A56DD0>: id(modules) = 139707107673104
Fatal Python error: PyInterpreterState_Delete: remaining subinterpreters
Python runtime state: finalizing (tstate=0x1a56dd0)

Aborted

I'm happy to look a bit further into the fix for this - Eric's pointers in this thread look useful to get started. @nanjekyejoannah did you get anywhere with this?

nanjekyejoannah · 2019-11-21T23:17:07Z

Am abit swamped and sick atm. You can go on and submit a fix.

vstinner · 2020-03-19T02:00:57Z

See also bpo-38865: "Can Py_Finalize() be called if the current interpreter is not the main interpreter?".

ned-deily · 2020-06-12T08:30:36Z

I note this is marked as a 3.7regression and still open. Since the cutoff for the final 3.7 bugfix mode release is in a few days, I'm assuming this means that 3.7 users will have to live with this regression. If you feel that is a problem, speak up now.

vstinner · 2020-06-15T16:36:20Z

I would not qualify the new Python 3.7 behavior (call Py_FatalError()) as a regression, so I remove "3.7regression" keyword.

Also, I don't think that we can easily fix this issue in a stable branch, I would prefer to start working on implementation this issue in the master branch, and only later consider to *maybe* backport the change. So I changed the Python version to "3.10".

I see multiple non trivial problems:

To finalize the subinterpreter A, Py_Finalize() may switch the current Python thread state from the main interpreter to a Python thread in the subintrepeter A. It can lead to new funny issues.
A subinterpreter can be stuck for whatever reason and refuse to stop. For example, the subinterpreter A may wait for an even from subinterpreter B. If we don't run two interpreters "in parallel" (in threads), it may be stuck forever.
Python 3.10 still has weird code to allow daemon threads to continue to run after Py_Finalize() complete. Maybe we need to add such hacks for subinterpreter which fail to be finalized? For example, kill them (exit) as soon as they attempt to acquire the GIL. Search for tstate_must_exit() in Python/ceval_gil.h.

By the way, currently Py_Finalize() calls PyInterpreterState_Clear() which call object finalizers in the main thread of the main interpreter, whereas these finalizers might expect to be called from the thread which created them. I don't know if we must change anything else.

Again, all these problems are very complex :-(

The simple option, which is sadly a backward incompatible change, is to raise a fatal error in Py_Finalize() if a subinterpreter is still running.

Maybe we can start by logging a warning into stderr for now, before introducing the backward incompatible change. Maybe even only emit it when -X dev is used? https://docs.python.org/dev/library/devmode.html

ericsnowcurrently · 2020-10-22T21:48:13Z

I see multiple non trivial problems:

To finalize the subinterpreter A, Py_Finalize() may switch the current Python thread state from the main interpreter to a Python thread in the subintrepeter A. It can lead to new funny issues.

I'm not sure what you mean with this. Are you talking about how another thread might be running that threadstate already? That doesn't sounds like a problem specific to this issue but rather a problem with Py_EndInterpreter() in general.

A subinterpreter can be stuck for whatever reason and refuse to stop. For example, the subinterpreter A may wait for an even from subinterpreter B. If we don't run two interpreters "in parallel" (in threads), it may be stuck forever.

The wait_for_thread_shutdown() call in Py_FinalizeEx() already has a similar behavior. In the case of one interpreter blocking another like that, how would that be possible where it isn't already a problem?

Python 3.10 still has weird code to allow daemon threads to continue to run after Py_Finalize() complete. Maybe we need to add such hacks for subinterpreter which fail to be finalized? For example, kill them (exit) as soon as they attempt to acquire the GIL. Search for tstate_must_exit() in Python/ceval_gil.h.

I consider the problem of daemon threads to be a separate problem. The solution proposed here for finalization doesn't change any behavior regarding daemon threads.

By the way, currently Py_Finalize() calls PyInterpreterState_Clear() which call object finalizers in the main thread of the main interpreter, whereas these finalizers might expect to be called from the thread which created them.

Do you have any examples? I'm having trouble imagining such a case.

I don't know if we must change anything else.

Again, all these problems are very complex :-(

I agree. However, automatically finalizing all other interpreters at the beginning of Py_FinalizeEx() doesn't introduce any new problems. At the same time, it makes sure that any resources in use in other interpreters have a chance to be released and that global resources used there don't cause crashes during runtime finalization.

The simple option, which is sadly a backward incompatible change, is to raise a fatal error in Py_Finalize() if a subinterpreter is still running.

As long as we require Py_FinalizeEx() to be called from the main interpreter, I don't see how the proposed change (in PR bpo-17575) would be backward incompatible.

Maybe we can start by logging a warning into stderr for now, before introducing the backward incompatible change. Maybe even only emit it when -X dev is used? https://docs.python.org/dev/library/devmode.html

I'd like to see a ResourceWarning if any other interpreters were still around.

vstinner · 2020-10-27T02:51:47Z

If tomorrow Python allows to run two interpreters in parallel (see bpo-40512: "Meta issue: per-interpreter GIL"), I don't think that it will be safe to execute object finalizers of a subinterpreter from the main interpreter in Py_Finalize().

IMO subinterpreters must be finalized manually by the programmer, since it's too tricky.

The safest option is to display a warning in Py_Finalize() if there are running subinterpreters.

ericsnowcurrently · 2024-07-08T19:20:33Z

In gh-121060 (4be1f37) I added automatic cleanup of subinterpreters. I've also backported it to 3.13.

(I need to revisit gh-17575, which I had forgotten about.)

ncoghlan added 3.7 (EOL) end of life 3.8 (EOL) end of life type-crash A hard crash of the interpreter, possibly with a core dump labels Mar 7, 2019

ned-deily added the 3.9 only security fixes label Jun 17, 2019

vstinner added the topic-subinterpreters label May 15, 2020

vstinner changed the title ~~Lingering subinterpreters should be implicitly cleared on shutdown~~ [subinterpreters] Lingering subinterpreters should be implicitly cleared on shutdown May 15, 2020

vstinner added 3.10 only security fixes and removed 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes labels Jun 15, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

bedevere-app bot mentioned this issue Mar 28, 2024

gh-80406: Finalise subinterpreters in Py_FinalizeEx() #17575

Open

ericsnowcurrently closed this as completed Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[subinterpreters] Lingering subinterpreters should be implicitly cleared on shutdown #80406

[subinterpreters] Lingering subinterpreters should be implicitly cleared on shutdown #80406

ncoghlan commented Mar 7, 2019

ncoghlan commented Mar 7, 2019

nanjekyejoannah commented Mar 12, 2019

encukou commented Mar 13, 2019

ericsnowcurrently commented Mar 29, 2019

ericsnowcurrently commented Mar 29, 2019

ncoghlan commented Mar 30, 2019

ned-deily commented Jun 17, 2019

nanjekyejoannah commented Aug 6, 2019

nanjekyejoannah commented Aug 6, 2019

nanjekyejoannah commented Oct 3, 2019

LewisGaul mannequin commented Nov 21, 2019

nanjekyejoannah commented Nov 21, 2019

vstinner commented Mar 19, 2020

ned-deily commented Jun 12, 2020

vstinner commented Jun 15, 2020

ericsnowcurrently commented Oct 22, 2020

vstinner commented Oct 27, 2020

ericsnowcurrently commented Jul 8, 2024

[subinterpreters] Lingering subinterpreters should be implicitly cleared on shutdown #80406

[subinterpreters] Lingering subinterpreters should be implicitly cleared on shutdown #80406

Comments

ncoghlan commented Mar 7, 2019

ncoghlan commented Mar 7, 2019

nanjekyejoannah commented Mar 12, 2019

encukou commented Mar 13, 2019

ericsnowcurrently commented Mar 29, 2019

ericsnowcurrently commented Mar 29, 2019

ncoghlan commented Mar 30, 2019

ned-deily commented Jun 17, 2019

nanjekyejoannah commented Aug 6, 2019

nanjekyejoannah commented Aug 6, 2019

nanjekyejoannah commented Oct 3, 2019

LewisGaul mannequin commented Nov 21, 2019

nanjekyejoannah commented Nov 21, 2019

vstinner commented Mar 19, 2020

ned-deily commented Jun 12, 2020

vstinner commented Jun 15, 2020

ericsnowcurrently commented Oct 22, 2020

vstinner commented Oct 27, 2020

ericsnowcurrently commented Jul 8, 2024