Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI::argv() crashes on linux when /proc is missing #845

Closed
bindreams opened this issue Feb 10, 2023 · 10 comments · Fixed by #923
Closed

CLI::argv() crashes on linux when /proc is missing #845

bindreams opened this issue Feb 10, 2023 · 10 comments · Fixed by #923

Comments

@bindreams
Copy link
Contributor

When /proc is missing on a Linux system, calling CLI::argv() fails with could not open /proc/self/cmdline for reading.

I was aware of this while implementing Unicode, but thought that this is an extraordinary scenario of a broken Linux filesystem. However apparently this is a common occurrence when a user calls chroot before running the application.

There is another way of obtaining argc and argv, but I'm a little scared of it. It does seem to work though.

#include <unistd.h>
#include <utility>

std::pair<int, char**> findargs() {
	int i;
	char** p = __environ - 2;

	for (i = 1; i != *reinterpret_cast<int*>(p - 1); i++) {
		p--;
	}

	return {i, p};
}

The idea here is to obtain a stack address nearby from __environ, then walk back, counting steps. When dereferencing a pointer as-if an int* reveals that it is equal to the amount of steps, it means we have reached a pointer to argc and walked through the entire array of argv.

One problem is that this has to be done before main, otherwise user might call setenv and change the pointer to environ (I presume). And if this is prepared in advance we might as well discard the /proc/self/cmdline approach altogether. At least no files are read so the setup would be pretty fast.

@phlptp
Copy link
Collaborator

phlptp commented Feb 10, 2023

Well I don't think we can put that code in CLI11. That just seems like a non-standard, scary, big security hole type of thing, even if in some way it does work.

Is this a fault outside of the std::runtime_error that function throws currently if it can't open the file?

@bindreams
Copy link
Contributor Author

No, the error message could not open /proc/self/cmdline for reading is the what() of runtime_error.

I wouldn't call this a security hole though. It returns the same address to argv as the one passed to main, so this is not like an internal buffer in the executable which the user never has access to. It think it's more or less guaranteed that the memory will be laid out this way by ELF.

Just wanted to throw an idea out there. chroot support would be nice, but it's your call. I suggest we keep the issue open though, in case someone else actually stumbles upon this in the wild.

@henryiii
Copy link
Collaborator

henryiii commented Mar 9, 2023

Couldn't we accept argv, argc, and just avoid using them if we can do something else for unicode support? Something like CLI11_UNICODE_PARSE(app, argv, argc), which will throw away argv and argc if it can get unicode info on Windows?

@bindreams
Copy link
Contributor Author

The idea crossed my mind when first designing my patch, but I decided against it because I think this could cause unexpected behavior for users who pre-process argc and argv before parsing them. For example, Google Test will take argc/argv, extract arguments intended for test config, and leave the rest. Then a user parses the remaining arguments (on linux) OR completely intact original arguments, converted to unicode (on windows), and, I imagine, gets weird bugs somewhere because sometimes extra arguments come through and sometimes not.

If we were to go the route of explicit argument conversion by the user, IMO a better idea would be to split the CLI11_UNICODE_PARSE into two parts: conversion separately, and parsing separately (i.e. leave parsing as is). We could steal an idea from Boost.Nowide:

int main(int argc, char** argv, char** env) {
    boost::nowide::args _(argc, argv, env); // Note the _ as a "don't care" name for the instance
    // Use argv and env as usual, they are now UTF-8 encoded on Windows
    return 0; // Memory held by args is released
}

Though I am fond of CLI::argv() as a feature by itself, this could be the more fool-proof way. I volunteer to implement it if you decide it's a good idea.

@henryiii
Copy link
Collaborator

henryiii commented Sep 15, 2023

I'm not fond of using _ as a "don't care" variable, since it collides with some #define macros in some frameworks for translation, and it only works once (unless it's added to the language, which has been suggested off and on). But this sounds fine to me. I'd much rather keep close to standards and avoid crashes on weird systems than skip argv/argn. :)

(And I don't think I understood why you needed the two step process before, but I see why now).

(I hope you don't need the **env, as that's not portable).

So the final result would look like this, I take it?

int main(int argc, char** argv) {
    CLI::App app{"App description"};
    argv = app.set_unicode_args(argc, argv);

    CLI11_PARSE(app, argc, argv);
    return 0;
}

CLI::App could provide the storage for the new arg string that way.

You could even allow CLI11_PARSE(app), actually, after that call, if you didn't want to modify the argv. :)

@bindreams
Copy link
Contributor Author

I hope you don't need the **env, as that's not portable

The previous code snippet with env was a direct copy from Nowide docs. We don't need env, and actually we don't even need argc.

Something like this (some name bikeshedding is in order):

int main(int argc, char** argv) {
    CLI::App app{"App description"};
    argv = app.ensure_utf8(argv);

    CLI11_PARSE(app);
    return 0;
}

I'll ping you when a provisional PR is ready. Also, do you wish to get rid of the CLI::argv() API or do I keep both?

bindreams added a commit to bindreams/CLI11 that referenced this issue Sep 18, 2023
@henryiii
Copy link
Collaborator

Let's just stick with the explicit API. CLI::argv() depends on being able to pull argv/argn out of thin air, which is not portable. Isn't it better to keep argn, both for consistency and in case the final element of argv wasn't null for some reason (which I guess it why it's not required?)?

@henryiii
Copy link
Collaborator

(though app.argv() would be possible)

@bindreams
Copy link
Contributor Author

Isn't it better to keep [argc]?

There's nothing I can do with it besides throw an exception when it does not match the value I get from CommandLineToArgvW. And at that point we are sanity-checking Windows.h. I do not depend on argv ending with nullptr (and never have).

though app.argv() would be possible

I'm not sure about this one. Implement this and you start having to answer difficult design questions:

  • What do I return if the user chooses not to call ensure_utf8?
    • std::optional is not in C++11, do I roll my own polyfill?
  • Maybe we should populate app.argv() on parse instead?
    • If so, this is a completely unrelated feature

In any case, the PR (#923) is already mostly done, I suggest we move to discuss details there.

@henryiii
Copy link
Collaborator

We want a consistent interface between Windows and non-windows. On non-windows you'd need the argc since you are not getting CommandLineToArgvW.

henryiii pushed a commit that referenced this issue Sep 20, 2023
Fixes #845 as discussed.

Comparing the two approaches of getting `argv`:
1. The "old" way, through `CLI::argv()`:
    ✔️ Works automatically and almost everywhere
    ❌ Small abstraction overhead on macOS
    ❌ Does not work in weird edge-cases such as missing `/proc`
2. This PR, through `app.ensure_utf8`:
✔️ True zero-overhead abstraction: you don't pay for what you don't use
✔️ Less moving parts than the "old" approach, probably can't be broken
❌ Requires extra code to be written by the user (which is sad because
ideally everyone should use this by default)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants