Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31061: fix crash in asyncio speedup module #2966

Merged
merged 18 commits into from
Aug 2, 2017

Conversation

thehesiod
Copy link
Contributor

@thehesiod thehesiod commented Aug 1, 2017

@1st1 1st1 requested a review from methane August 1, 2017 01:25
@1st1 1st1 self-assigned this Aug 1, 2017
@thehesiod thehesiod changed the title attempt to fix http://bugs.python.org/issue31061 attempt to fix bpo-31061 Aug 1, 2017
@thehesiod
Copy link
Contributor Author

thehesiod commented Aug 1, 2017

this is my first cpython PR, let me know what else needs to be done to get this into any other releases it's required in.

if (Future_CheckExact(fut)) {
/* When fut is subclass of Future, finalizer is called from
* subtype_dealloc.
*/
PyObject_GC_Track(self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, you're right. it's needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just following what was done in https://bugs.python.org/issue26617. I think it's worth someone grepping through the whole codebase, I saw several other places that maybe needed this. It seems like every dealloc should have a PyObject_GC_UnTrack at the top?

Copy link
Member

@methane methane Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, noddy4 example should do it at first :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw how do we get this into 3.6.3 as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like every dealloc should have a PyObject_GC_UnTrack at the top?

When the type has Py_TPFLAGS_HAVE_GC.
lru_cache_dealloc in _functoolsmodule seems need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed an issue for that. http://bugs.python.org/issue31095

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, should I remove the LRU change I added here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. asyncio speedup module is introduced from Python 3.6.
It make easy to backport.

@methane
Copy link
Member

methane commented Aug 1, 2017

Since _asinciomodule is compiled with python interpreter, it can use private macro APIs
(_PyObject_GC_TRACK and _PyObject_GC_UNTRACK).

@methane
Copy link
Member

methane commented Aug 1, 2017

Oh, sorry, macro version API broke Windows build and you used public APIs to fix it.

@@ -962,14 +962,18 @@ FutureObj_dealloc(PyObject *self)
{
FutureObj *fut = (FutureObj *)self;

PyObject_GC_UnTrack(self);

if (Future_CheckExact(fut)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move PyObject_GC_UnTrack after this if block, and remove Track and Untrack in the block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that would mean this (34fae03) is wrong too then no?

because it first does an UnTrack: 34fae03#diff-c3cf251f16d5a03a9e7d4639f2d6f998L1113

With the same Track/Untrack pattern here:
34fae03#diff-c3cf251f16d5a03a9e7d4639f2d6f998R1129

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current code:

UnTrack()
Track()
PyObject_CallFinalizerFromDealloc()
UnTrack()

I proposed:

PyObject_CallFinalizerFromDealloc()
UnTrack()

34fae03 does other things between UnTrack() and Track()
But this code does only Future_CheckExact() and it is safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, I'm claiming ignorance since I don't know what these macros achieve :) I'd personally feel safer if we had a test case, do you know how to create a testcase to cause this to crash? Mine takes a day to reproduce in a production environment :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPython's circular reference GC uses doubly linked list to track object which can be member of circular reference.
PyObject_GC_UnTrack() removes object from the list, and PyObject_GC_Track() insert the object to the list.

Despite very undetarministic multithreading, I can cause SEGV quickly with this code:

import asyncio
import gc
gc.set_debug(gc.DEBUG_STATS)

class Evil:
    def __del__(self):
        gc.collect()

def main():
    f = asyncio.Future()
    f.set_result(Evil())

for i in range(100):
    main()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I'll add testcases and add your suggestions, thanks so much for all the help!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, added FutureObj testcase, have a good test case for the TaskObj? Not quite sure how to trigger it.

@methane
Copy link
Member

methane commented Aug 1, 2017

@thehesiod
Copy link
Contributor Author

added news and reverted LRU change

@thehesiod
Copy link
Contributor Author

should we put as many fixes as we can in this PR, or create another PR with as many as we can? I need the defaultdict one too...and maybe even others as who knows what third party libraries do. I feel like we've opened a pandoras box of crashers :)

@methane
Copy link
Member

methane commented Aug 1, 2017

I already created #2974.
More pull request for updating document is welcome.

@methane methane added the type-bug An unexpected behavior, bug, or error label Aug 1, 2017

/*
See description in Doc/c-api/gcsupport.rst
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it.
We can reach the document easily with searching API name.

@methane methane changed the title attempt to fix bpo-31061 bpo-31061: fix crash in asyncio speedup module Aug 1, 2017
@1st1
Copy link
Member

1st1 commented Aug 1, 2017

The latest revision LGTM. But can you use blurb to generate the NEWS entry?

@thehesiod
Copy link
Contributor Author

@1st1 should I remove the one in NEWS? Also was hoping I could add a testcase for Task but couldn't figure out how to repro it, especially given Task inherits from Future so I don't want to trigger the Future's crash

Misc/NEWS Outdated
@@ -10,6 +10,8 @@ What's New in Python 3.7.0 alpha 1?
Core and Builtins
-----------------

- bpo-31061: Fixed a crash when using asyncio and threads.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit: the change in Misc/NEWS needs to be reverted, we need only blurb file.

@1st1
Copy link
Member

1st1 commented Aug 1, 2017

@1st1 should I remove the one in NEWS?

Yes, please!

Also was hoping I could add a testcase for Task but couldn't figure out how to repro it, especially given Task inherits from Future so I don't want to trigger the Future's crash

I'm trying to come up with a test.

@1st1
Copy link
Member

1st1 commented Aug 1, 2017

The following test crashes unpatched CPython:

    import gc

    def test_task_del_collect(self):
        async def coro():
            return Evil()

        class Evil:
            def __del__(self):
                gc.collect()

        for i in range(100):
            task = self.new_task(self.loop, coro())
            self.loop.run_until_complete(task)

@1st1
Copy link
Member

1st1 commented Aug 1, 2017

The test also fails if we fix only Task, but not Future.

@thehesiod
Copy link
Contributor Author

so I had something similar but it doesn't crash:

import gc
import asyncio

class Evil:
    def __del__(self):
        gc.collect()


async def run():
    return Evil()


loop = asyncio.get_event_loop()

for i in range(100):
    loop.run_until_complete(asyncio.Task(run()))
#    f = asyncio.Future()
#    f.set_result(Evil())

@thehesiod
Copy link
Contributor Author

ah, figured it out, had to have multiple tasks pending in the run loop.

def run():
return Evil()

yield from asyncio.gather(*[asyncio.Task(run()) for _ in range(100)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test won't run. yield from makes test_task_del_collect() a generator and test suite doesn't know how to run them.

Please change to

self.loop.run_until_complete(
    asyncio.gather(*[asyncio.Task(run()) for _ in range(100)]))

@thehesiod
Copy link
Contributor Author

thehesiod commented Aug 1, 2017

ya realized as I was going to work, this should fix it

Copy link
Member

@1st1 1st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no further comments, LGTM. @methane do you approve this PR?

@@ -4,6 +4,7 @@
import contextlib
import functools
import io
import gc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit: 'gc' import should be before 'functools' -- we sort imports alphabetically in CPython.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth running a linter on CI to catch these

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, good idea. BTW help is always welcome by python/core-workflow. You're an awesome contributor, we'd love you to continue! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're too kind :) Will keep that in mind if we run into any other cpython issues.

@thehesiod
Copy link
Contributor Author

this set of PRs is a massive win for Python. It would be nice if a core dev could look and find a way to nip it at the bud to avoid these hacky calls, as shown they're highly error prone. Who knows how many third party modules are missing them too.

Copy link
Member

@methane methane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@methane methane merged commit de34cbe into python:master Aug 2, 2017
methane pushed a commit to methane/cpython that referenced this pull request Aug 2, 2017
@bedevere-bot
Copy link

GH-2984 is a backport of this pull request to the 3.6 branch.

@1st1
Copy link
Member

1st1 commented Aug 2, 2017

Do we have C Task/Future in 3.5?

@methane
Copy link
Member

methane commented Aug 2, 2017

No. asyncio speedup is introduced at 3.6

@thehesiod thehesiod deleted the thehesiod/aio_crash_fix branch August 2, 2017 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants