Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exiting a process calls exit() which isn’t thread-safe #83994

Closed
DemiMarie opened this issue Apr 8, 2021 · 67 comments
Closed

Exiting a process calls exit() which isn’t thread-safe #83994

DemiMarie opened this issue Apr 8, 2021 · 67 comments
Labels
C-bug Category: This is a bug. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@DemiMarie
Copy link
Contributor

On Unix-like platforms, std::process::exit calls libc::exit(). However, this can lead to undefined behavior in a multithreaded process.

In my case, this showed up as use-after-free crashes when running the testsuite for RPM Oxide. librpm registers an atexit() handler to clean up global resources, but this causes some of the resources to be freed while other threads are still using them. Other projects have had similar issues, such as the Rust bindings for RocksDB.

The best answer I know of is to call quick_exit instead, which only calls functions registered with at_quick_exit. Such functions should be safe to be called while other threads are running.

@DemiMarie DemiMarie added the C-bug Category: This is a bug. label Apr 8, 2021
@jonas-schievink jonas-schievink added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 8, 2021
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Apr 8, 2021
@jyn514
Copy link
Member

jyn514 commented Apr 8, 2021

Hmm, it seems really weird to me for exit() not to call libc::exit. It seems like the issue is libraries shouldn't be registering exit handler that aren't thread safe?

@ghost
Copy link

ghost commented Apr 8, 2021

I've seen a similar issue before: sfackler/rust-openssl#1293

@jyn514
Copy link
Member

jyn514 commented Apr 8, 2021

I'm going to move this from T-libs-impl to T-libs because not calling libc::exit is user-facing.

@jyn514 jyn514 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 8, 2021
@ghost
Copy link

ghost commented Apr 8, 2021

The best answer I know of is to call quick_exit instead, which only calls functions registered with at_quick_exit. Such functions should be safe to be called while other threads are running.

I don't see how would quick_exit be different from exit if you registered a non-thread-safe exit handler (by at_quick_exit). You should either use _Exit or not registering one at all.

@hameerabbasi
Copy link
Contributor

Removing I-prioritize and adding I-nominate as discussed on Zulip.

@rustbot labels -I-prioritize +I-nominate

@jyn514 jyn514 added I-nominated and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 8, 2021
@DemiMarie
Copy link
Contributor Author

The best answer I know of is to call quick_exit instead, which only calls functions registered with at_quick_exit. Such functions should be safe to be called while other threads are running.

I don't see how would quick_exit be different from exit if you registered a non-thread-safe exit handler (by at_quick_exit). You should either use _Exit or not registering one at all.

The assumption (from an old C++ proposal I read) is that doing so is considered a bug on the part of the caller of at_quick_exit.
In particular, quick_exit does not call C++ static destructors for this reason.

@cuviper
Copy link
Member

cuviper commented Apr 12, 2021

In Linux man-pages, exit is marked MT-unsafe:
https://man7.org/linux/man-pages/man3/exit.3.html#ATTRIBUTES

The exit() function uses a global variable that is not protected, so it is not thread-safe.

@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/libs meeting, and came to two conclusions:

  • We feel that if a library has atexit handlers that aren't thread-safe, that's an issue in that library. We can't do anything to fix that; that requires appropriate locking in the library. We shouldn't call quick_exit, because such a race could just as easily happen in an at_quick_exit handler. And we shouldn't call _Exit or _exit and bypass the atexit handlers, because people may legitimately be relying on those atexit handlers, and there's no issue if your atexit handlers are thread-safe.
  • We do understand that exit is not marked as MT-safe; however, our understanding is that in practice every major C library seems to use an appropriate lock around the atexit array. If there's a specific C library that doesn't, we'd be willing to consider a workaround on that platform (e.g. wrapping it with our own lock), but otherwise we'd propose to close this.

@rfcbot close

@rfcbot
Copy link

rfcbot commented Apr 14, 2021

Team member @joshtriplett has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. labels Apr 14, 2021
@DemiMarie
Copy link
Contributor Author

We feel that if a library has atexit handlers that aren't thread-safe, that's an issue in that library. We can't do anything to fix that; that requires appropriate locking in the library. We shouldn't call quick_exit, because such a race could just as easily happen in an at_quick_exit handler. And we shouldn't call _Exit or _exit and bypass the atexit handlers, because people may legitimately be relying on those atexit handlers, and there's no issue if your atexit handlers are thread-safe.

Sadly, this makes safe FFI with some important libraries impossible. In particular, C++ static destructors, which are also called by exit, are not thread-safe, which caused paritytech/substrate#7166. The root cause is not races in atexit handlers themselves, but rather that atexit handlers are usually used to free global resources. Therefore, calling them more than once leads to double frees, and calling them while other threads are running can lead to use-after-free. quick_exit, on the other hand, is explicitly designed to be safely called while other threads are running.

@DemiMarie
Copy link
Contributor Author

Also see openethereum/parity-ethereum#6213

@richfelker
Copy link

exit is thread-safe. Installing atexit handlers which free resources other threads might still be using is not thread-safe and is a bug in whatever code is installing those atexit handlers, and should be fixed there.

@DemiMarie
Copy link
Contributor Author

exit is thread-safe. Installing atexit handlers which free resources other threads might still be using is not thread-safe and is a bug in whatever code is installing those atexit handlers, and should be fixed there.

What about C++ static destructors?

@richfelker
Copy link

What about C++ static destructors?

It's a programming bug to have code which might access them while (or after) they're being destroyed, or unless they're read-only and immutable, without some kind of synchronization around the access to them.

If you have objects that are really supposed to have "whole program lifetime" that needs to include continuing through exit to the moment the process ceases to exist. You cannot deallocate them, ever. This may mean you need a wrapper ctor that just constructs them with dynamic lifetime so that they're not destructed at exit.

@DemiMarie
Copy link
Contributor Author

What about C++ static destructors?

It's a programming bug to have code which might access them while (or after) they're being destroyed, or unless they're read-only and immutable, without some kind of synchronization around the access to them.

If you have objects that are really supposed to have "whole program lifetime" that needs to include continuing through exit to the moment the process ceases to exist. You cannot deallocate them, ever. This may mean you need a wrapper ctor that just constructs them with dynamic lifetime so that they're not destructed at exit.

So that means I need to report a bug against pretty much every C++ library that uses statically allocated resources. is there a linker flag I can pass to disable static destruction? Because I guarantee that I will not be the first one to face this problem, and changing the library is often impractical.

@richfelker
Copy link

So that means I need to report a bug against pretty much every C++ library that uses statically allocated resources.

I don't think that's quite accurate. If a C++ library has mutable singletons, there must already be some contract controlling shared access to them, and if the singleton is destructed on exit, then exiting is part of that contract, and the application is obligated not to exit without first obtaining exclusive access to said object - whether by holding a lock, ensuring that any thread potentially accessing the object has exited, or some other means.

The immutable singletons situation is somewhat different, e.g. for things like runtime-generated tables of constants (ignore the fact that generating these at runtime is a bug because it prevents sharing them like you could do if they were static const data). However if the dtor actually destroys anything, the data was not actually immutable. Part of real immutability is never being freed.

You can disable static destruction with a big hammer by calling atexit after ctors have run with a handler that just does _Exit. I'm not sure this is a good idea but it's probably no worse than doing it with a linker flag. It will break code that is using dtors correctly, with proper synchronization, to ensure important invariants like writing out valuable data before termination.

changing the library is often impractical.

That's what people said with incorrect/nonportable code that broke on musl in the early days, and indeed it was hard to get maintainers to take bug reports seriously, but eventually we mostly did, and the whole ecosystem got better as a result. The issue you've uncovered is a real problem I've been wanting to get projects to fix for a long long time, and despite being difficult I believe it is worthwhile. It does not only affect Rust; the same problem you're seeing was already there with C or C++ code using these libraries.

@DemiMarie
Copy link
Contributor Author

So that means I need to report a bug against pretty much every C++ library that uses statically allocated resources.

I don't think that's quite accurate. If a C++ library has mutable singletons, there must already be some contract controlling shared access to them, and if the singleton is destructed on exit, then exiting is part of that contract, and the application is obligated not to exit without first obtaining exclusive access to said object - whether by holding a lock, ensuring that any thread potentially accessing the object has exited, or some other means.

That is what this bug is about. As long as std::process::exit calls exit and is safe, it is not possible to write safe Rust bindings to such libraries, as there is no way to enforce this guarantee. Of course, I could be (and hope to be) missing something.

@cuviper
Copy link
Member

cuviper commented Apr 15, 2021

Is that any different than simply returning from main?

@richfelker
Copy link

No, there is no difference vs returning from main with other threads still running.

... it is not possible to write safe Rust bindings to such libraries

Of course it is not possible to write safe Rust bindings to libraries that have inherent object lifetime bugs internally. This is expected.

@agurney
Copy link

agurney commented Aug 6, 2021

@DemiMarie In principle? But that's quite a process that goes far beyond the scope of this issue. You may be interested in the http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2461.pdf, which touches this area and which has seen some positive interest from the C standards committee. I don't know if or how the author is planning to take that forward.

@jyn514
Copy link
Member

jyn514 commented Feb 14, 2023

The final comment period, with a disposition to close, as per the #83994 (comment), is now complete.

I'm going to close this issue since T-libs agreed it isn't an issue with the rust standard library.

@jyn514 jyn514 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 14, 2023
@RalfJung
Copy link
Member

The same arguments apply here as they do with environment handling: some types of unsynchronized changes to the environment from C code could race with an otherwise-synchronized one from Rust on some platforms, but we don't mark the corresponding functions in std::env as unsafe.

FWIW, it seems increasingly likely that at least std::env::set_var will end up becoming unsafe (Cc #90308). But of course there's no way that we can make "return from main" unsafe.

@DemiMarie
Copy link
Contributor Author

The same arguments apply here as they do with environment handling: some types of unsynchronized changes to the environment from C code could race with an otherwise-synchronized one from Rust on some platforms, but we don't mark the corresponding functions in std::env as unsafe.

FWIW, it seems increasingly likely that at least std::env::set_var will end up becoming unsafe (Cc #90308). But of course there's no way that we can make "return from main" unsafe.

What rust can do is only call exit() if there is only one thread running, and call something else otherwise.

@richfelker
Copy link

What rust can do is only call exit() if there is only one thread running, and call something else otherwise.

I suppose you "could" (if you could even distinguish that case; im general it's not possible), but this would have all sorts of nasty consequences like violating least surprise (different behavior for MT vs ST process), breaking atomicity of stdio operations, leaving exit cleanup functions unexecuted, etc.

If there's any proper action to be had here, it's on the C side. The text about calling exit more than once being UB was written in the absence of threads and was obviously intended to make recursive exit (via atexit handlers) undefined, not to be a constraint on MT programs outside the scope of C. This should just be fixed (at least in POSIX and other more specific implementation specs) to make it so only the recursive case is undefined.

@DemiMarie
Copy link
Contributor Author

Do musl and glibc behave correctly?

@richfelker
Copy link

I guess it depends on what properties you want. I know glibc has or at least had unrelated issues with stdio synchronization at exit being missing, but I believe it safely handles multiple threads calling exit. On the other hand, musl used to have a global exit lock that made it so concurrent callers to exit would just wait forever until the first one finished exiting. That was removed, which was probably a bad idea, on the basis that it was undefined. But as far as I know there has never been any problem with a multithreaded program exiting (either via a call to exit, or main returning) as long as it was the only possible point of exit. This is in accordance with what the C standard already requires (exit is "thread safe" but just has the special, likely unintentional, constraint that you can only call it from one thread).

I'll open this for reevaluation in musl; we'll probably add back the big lock. (In any program where atexit has been linked, including all dynamic linked programs, the atexit lock already partly fills this role, but allows successive handler functions to jump around among which thread executes them.)

None of this changes my opinion that there is nothing Rust needs to do here, and that exiting already "just works", per the standard and implemntations, as long as you don't call exit from more than one thread.

@RalfJung
Copy link
Member

RalfJung commented Feb 14, 2023

exiting already "just works", per the standard and implemntations, as long as you don't call exit from more than one thread.

That's not an acceptable restriction for us though. This is safe code:

fn main() {
  std::thread::spawn(|| std::process::exit(1));
  std::process::exit(1);
}

@richfelker
Copy link

That admits a trivial solution on the Rust side: put a lock on it.

@RalfJung
Copy link
Member

We can do that in Rust of course, but it has the same issue as the env lock: it doesn't apply to code written in other languages linked via FFI.

You are basically saying that any C library that calls exit is not thread safe. And that it is impossible to soundly link Rust and Java code (since either side could call exit any time and there is no shared lock). That is somewhat hard to believe, is the state of the ecosystem really that bad?

@richfelker
Copy link

I don't think it's the same issue as the proposed env lock, which was based on a misunderstanding of the contracts involved. Accessing the environment is specified to be thread-safe and allowed to be done concurrently from arbitrary threads. It's that modifying it is a disallowed operation except in very restrictive contexts. Essentially, in a potentially multithreaded program (and thus in a general programming context), the environment is immutable input state reflecting how the program was invoked.

Regarding the disallowance for multiple threads to call exit, I think you're making a much bigger deal of it than it is. Regardless of whether it's thread-safe for library code to call exit, it's library-unsafe. That is, it alters the state of the process (by ending it) outside the scope of what belongs to the calling context or library internals. This problem would still exist even if it were fully well-defined to call exit concurrently.

I'm still in favor of working to get this "fixed" on the C side, because the undefinedness seems completely gratuitous and potentially problematic. But I don't think it has any significant practical impact on Rust. If you're calling FFI library code that terminates your process behind your back, you have bigger problems.

@RalfJung
Copy link
Member

RalfJung commented Feb 14, 2023

Whether or not code has Undefined Behavior is a big deal around here. And yes it makes a huge difference for us whether a library can "just" stop the entire process from running in a very visible way (this is bad style but nobody will be surprised to hear that it is possible), vs cause UB that might have arbitrary bad side-effects (which should be 100% ruled out, that is the entire point of Rust's safety guarantees).

@richfelker
Copy link

And yes it makes a huge difference for us whether a library can "just" stop the entire process ... vs cause UB that might have arbitrary bad side-effects (which should be 100% ruled out, that is the entire point of Rust's safety guarantees).

You're talking about a library written in C, not Rust. A library written in C is inherently not memory-safe and can always blow up your process with arbitrarily-bad side effects in ways Rust cannot rule out.

If the library is written in Rust and uses std::process::exit, Rust's standard library can put a lock around the actual call to the underlying C exit and thereby ensure that it's well-defined.

@RalfJung
Copy link
Member

Now we're back to this message. ;)

It should be possible to safely combine code written in Rust with code written in another safe language, such as Java. But there's no way to coordinate a shared lock between Rust's and Java's exit functions. So this relies on an ecosystem that provides reasonable primitives, free from the kind of surprising footguns that libc exit seems to (be allowed to) have.

I'm still in favor of working to get this "fixed" on the C side, because the undefinedness seems completely gratuitous and potentially problematic.

I'm putting my support behind that option then. :D

@ojeda
Copy link
Contributor

ojeda commented Feb 14, 2023

This could be fixed for C2y. If a major libc like musl agrees with the new wording, that would help pushing the paper.

Am I understanding correctly that you are looking for something along the lines of "If a program calls the exit function more than once, or calls the quick_exit function in addition to the exit function, then every thread that calls either of the functions, except the first thread, will be blocked until process termination."? (plus all related changes needed)

Another way would be to make it IB, and hope all major libcs end up documenting a lock.

@richfelker
Copy link

I'm not sure I'm reading that proposed text correctly, but as I read it, it's saying exit would suspend execution of all threads except the one that calls it? That would be contrary to all existing practice and would necessarily deadlock if any thread were in the middle of a stdio operation or held a lock used by any of the exit handlers. I think maybe the intent was that only those vying to exit would be suspended, i.e. exit would behave as if it acquires a mutex before performing any operations and never releases that mutex.

I'm not sure the degree to which we could get consensus on something like that, but we can see. This probably calls for some research into what glibc, the BSDs, Apple, etc. do when multiple threads call exit.

I still think the extent of the problem is exaggerated. Traditionally and even now, while Java is supposedly memory safe at the specification level, the implementation of the runtime is chock-full of UB around system-interface-level things like this. For example I recall there being multiple versions of the backend for executing child processes, and all of them having UB around misuse of vfork or AS-unsafe functions after fork in a MT parent. To me, it's completely expected that, if you mix a runtime of Java's quality (or lack thereof) with Rust, and it races to exit, things will break in subtle to spectacular ways.

@workingjubilee
Copy link
Member

While it's certainly not a language that tickles my fancy, taking such a swipe at Java's prior bugs here is not only beneath you, it is a distraction, as Java was just an example and its quality of implementation doesn't matter. The problem in question returns if one introduces runtimes that don't have any UB around system interfaces. Even most purely functional programming languages have functions that can diverge at runtime (and thus call exit(n)).

It would seem absurd if, by dint of a system only providing C interfaces to call process-terminating functions, one could not actually confidently run a program made by linking any two purely functional programming languages, even if both had their soundness actually formally proven, independently, and even if both used a "thin runtime" as C and Rust do, without having them both first agree to only allow one to call a diverging function amongst all their threads.

@RalfJung
Copy link
Member

RalfJung commented Feb 15, 2023

Indeed feel free to substitute Java for literally any language that provides exit, which is the vast majority. The way exit seems to work makes it fundamentally impossible to write multi-language programs with memory safety guarantees. Each language can hack around that issue for single-language programs, but the only way to actually fix this is to have the system libs provide an exit that is safe to call from multiple threads concurrently.

(That is also the parallel with the env situation: each individual language can actually provide a thread-safe mutable environment by adding a lock. But the only way to have that work in multi-language programs is to make the underlying system lib thread safe for env mutation. Those libs don't seem to consider that a desirable property, and the rest of the ecosystem is beholden to that decision.)

We don't necessarily need the C standard to change here, but for targets Rust supports we do need to ensure that their exit implementation if sufficiently thread-safe.

@ojeda
Copy link
Contributor

ojeda commented Feb 15, 2023

I'm not sure I'm reading that proposed text correctly, but as I read it, it's saying exit would suspend execution of all threads except the one that calls it?

Yeah, one could interpret it like that. I moved the "except" part and added a couple commas, that should make it a bit more clear, please take a look. Otherwise, the standard can always be more explicit, like repeating "that called ...". To be clear, the actual wording could be different, but I wanted to see if we were speaking about the same thing.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 18, 2023

For what it's worth we have done some research and glibc does in fact lock things. It apparently sometimes causes issues if you have more than 32 atexit handlers (it organizes those handlers in buckets of 32) but if you don't have such an exciting program you are likely to have a fairly predictable result (and the issues may simply be due to bugs in the locking impl).

@RalfJung
Copy link
Member

RalfJung commented Jul 1, 2024

For what it's worth we have done some research and glibc does in fact lock things. It apparently sometimes causes issues if you have more than 32 atexit handlers (it organizes those handlers in buckets of 32) but if you don't have such an exciting program you are likely to have a fairly predictable result (and the issues may simply be due to bugs in the locking impl).

@workingjubilee turns out that exactly 32 handlers are enough: #126600

@tbu-
Copy link
Contributor

tbu- commented Jul 1, 2024

@workingjubilee turns out that exactly 32 handlers are enough: #126600

No, glibc adds its own, so it is technically more than 32 if you add 32 handlers.

@nicowilliams
Copy link

In my case, this showed up as use-after-free crashes when running the testsuite for RPM Oxide. librpm registers an atexit() handler to clean up global resources, but this causes some of the resources to be freed while other threads are still using them. Other projects have had similar issues, such as the Rust bindings for RocksDB.

The issue here is that the other threads need to be quiesced before releasing those resources or do not release those resources. It's a bug in RPM Oxide.

@DemiMarie
Copy link
Contributor Author

In my case, this showed up as use-after-free crashes when running the testsuite for RPM Oxide. librpm registers an atexit() handler to clean up global resources, but this causes some of the resources to be freed while other threads are still using them. Other projects have had similar issues, such as the Rust bindings for RocksDB.

The issue here is that the other threads need to be quiesced before releasing those resources or do not release those resources. It's a bug in RPM Oxide.

I avoided the problem by adding an atexit() handler that acquired a mutex (and never released it), and requiring all calls to the library that used the relevant state to acquire the same mutex. This worked, but the reason I filed this issue is because I ran into the problem during development.

@RalfJung
Copy link
Member

This is a closed issue, please move the discussion to a place that is better suited. :)

(e.g. #126600)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests