Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cgroup namespaces #781

Closed
wants to merge 1 commit into from
Closed

Add support for cgroup namespaces #781

wants to merge 1 commit into from

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Apr 26, 2016

This is a very simple implementation because it doesn't require any
configuration unlike the other namespaces, and in its current state it
only masks paths.

This feature is available in Linux 4.6+ and is enabled by default if
the kernel is compiled with CONFIG_CGROUP=y.

TODO:

  • Change rootfs_linux setup to actually mount cgroup if we're in a cgroup + user namespace.

Depends on opencontainers/runtime-spec#397

@mrunalp
Copy link
Contributor

mrunalp commented Apr 26, 2016

This looks good. We'll wait for upstream PR to be merged.

@jessfraz
Copy link
Contributor

this is the key for the non root containers :)

@mrunalp
Copy link
Contributor

mrunalp commented Apr 26, 2016

@jfrazelle Not really. This doesn't help with non root as we discussed on the dev list. This just namespaces the cgroups path inside the container.

@jessfraz
Copy link
Contributor

Ah boooooo, is the goal in linux to eventually make it have more power or
no?

On Tue, Apr 26, 2016 at 9:43 AM, Mrunal Patel [email protected]
wrote:

@jfrazelle https://github.com/jfrazelle Not really. This doesn't help
with non root as we discussed on the dev list. This just namespaces the
cgroups path inside the container.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#781 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@jessfraz
Copy link
Contributor

what a disappointing feature haha

On Tue, Apr 26, 2016 at 9:47 AM, Jessica Frazelle [email protected] wrote:

Ah boooooo, is the goal in linux to eventually make it have more power or
no?

On Tue, Apr 26, 2016 at 9:43 AM, Mrunal Patel [email protected]
wrote:

@jfrazelle https://github.com/jfrazelle Not really. This doesn't help
with non root as we discussed on the dev list. This just namespaces the
cgroups path inside the container.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#781 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu
http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@mrunalp
Copy link
Contributor

mrunalp commented Apr 26, 2016

@jfrazelle Hopefully it will be more useful in the feature :)

@cyphar
Copy link
Member Author

cyphar commented Apr 26, 2016

Yeah, tbh I was a bit dissapointed when someone mentioned the new feature to me and I looked it up -- there's so much more it could do. For example, it could fake /proc/{mem,cpu}info so that it contains just information derived from the container limits. And of course there's the "actually administer your own subtree".

[REMOVED OLD PATCH IDEA]

[REMOVED OTHER PATCH IDEA]

I've decided to make it so that unshare(CLONE_NEWCGROUP) will create a subtree in all of the cgroups a task is associated with. Then the calling process will be migrated. The subtree is owned by the current_fs_{u,g}id().

You can see the current state of my kernel patch in my Linux branch.

/cc @jfrazelle @mrunalp

@mrunalp
Copy link
Contributor

mrunalp commented May 5, 2016

Yeah, it does make sense from security perspective that container can't change its own cgroups limits but allowing managing subtress is reasonable. Following the upstream converstion @cyphar. Thanks!

This is a very simple implementation because it doesn't require any
configuration unlike the other namespaces, and in its current state it
only masks paths.

This feature is available in Linux 4.6+ and is enabled by default for
kernels compiled with CONFIG_CGROUP=y.

Signed-off-by: Aleksa Sarai <[email protected]>
@yummypeng
Copy link

yummypeng commented Nov 3, 2016

I am also working on adding support for cgroup namespace in runc recently. Now I'm confused that if there is necessity that we allow containers have shared cgroup ns. Since the intention of cgroup ns is to mask cgroup paths, it can somewhat prevent host's information exposed to container.

For example, if we run a container without cgroup ns, then its /proc/{pid}/cgroup would be like this:

root@LFG1000321762:~/tools# runc run without_cgroupns
root@runc:/# cat /proc/self/cgroup
13:name=systemd:/user/0.user/5.session/without_cgroupns
12:pids:/without_cgroupns
11:hugetlb:/user/0.user/5.session/without_cgroupns
10:net_prio:/user/0.user/5.session/without_cgroupns
9:perf_event:/user/0.user/5.session/without_cgroupns
8:net_cls:/user/0.user/5.session/without_cgroupns
7:freezer:/user/0.user/5.session/without_cgroupns
6:devices:/user/0.user/5.session/without_cgroupns
5:memory:/user/0.user/5.session/without_cgroupns
4:blkio:/user/0.user/5.session/without_cgroupns
3:cpuacct:/user/0.user/5.session/without_cgroupns
2:cpu:/user/0.user/5.session/without_cgroupns
1:cpuset:/without_cgroupns

This exposure of cgroup names to the processes running inside a container may result in some problems. With cgroup ns enabled, container's /proc/{pid}/cgroup would be like this:

root@LFG1000321762:~/tools# runc run with_cgroupns
root@runc:/# cat /proc/self/cgroup
13:name=systemd:/
12:pids:/
11:hugetlb:/
10:net_prio:/
9:perf_event:/
8:net_cls:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpuacct:/
2:cpu:/
1:cpuset:/
root@runc:/# ls -l /proc/self/ns
total 0
lrwxrwxrwx 1 root root 0 Nov  3 01:31 cgroup -> cgroup:[4026532421]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 ipc -> ipc:[4026532364]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 mnt -> mnt:[4026532362]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 net -> net:[4026532367]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 pid -> pid:[4026532365]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 uts -> uts:[4026532363]

Then, if we run another container(in another session) which has shared cgroup ns with the above one, its /proc/{pid}/cgroup would be like this:

root@LFG1000321762:~/tools# runc run shared_cgroupns
root@runc:/# cat /proc/self/cgroup
13:name=systemd:/../../1.session/shared_cgroupns
12:pids:/../shared_cgroupns
11:hugetlb:/../../1.session/shared_cgroupns
10:net_prio:/../../1.session/shared_cgroupns
9:perf_event:/../../1.session/shared_cgroupns
8:net_cls:/../../1.session/shared_cgroupns
7:freezer:/../../1.session/shared_cgroupns
6:devices:/../../1.session/shared_cgroupns
5:memory:/../../1.session/shared_cgroupns
4:blkio:/../../1.session/shared_cgroupns
3:cpuacct:/../../1.session/shared_cgroupns
2:cpu:/../../1.session/shared_cgroupns
1:cpuset:/../shared_cgroupns
root@runc:/# ls -l /proc/self/ns
total 0
lrwxrwxrwx 1 root root 0 Nov  3 01:34 cgroup -> cgroup:[4026532421]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 ipc -> ipc:[4026532305]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 mnt -> mnt:[4026532303]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 net -> net:[4026532308]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 pid -> pid:[4026532306]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 uts -> uts:[4026532304]

We can see that, part of cgroup paths is now exposed to processes running inside this container. Besides, I think paths like /../../1.session/shared_cgroupns are meaningless here.

So I don't think it's necessary to allow containers have shared cgroup ns, it's not secure either.

WDYT @cyphar

@cyphar
Copy link
Member Author

cyphar commented Nov 3, 2016

@yummypeng If you set cgroupsPath for each container so that they're properly set up, you could make it work properly. Overall there's no reason to disable a kernel feature -- if a user wants to set up containers like that it's up to them.

Also, note that it hasn't exposed the entire cgroup path (just the shared ancestor component), so it's not as big of a deal as you might think. Also as for this:

Besides, I think paths like /../../1.session/shared_cgroupns is meaningless here.

They're not accessible, but they tell you something about what cgroup you're in. Inside a cgroup namespace, all cgroupfs mounts are scoped to the cgroup namespace (so you cannot look at any path outside / inside the namespace). So there is actually legitimate value for this.

However, if you're playing around with cgroup namespaces note that you have to also rewrite some of the rootfs_linux.go code so that it doesn't bindmount cgroups (otherwise you lose the whole benefit of cgroup namespaces). I was planning on doing this in this PR but have been really busy with other things.

@yummypeng
Copy link

@cyphar Thanks for your suggestion. I would think it twice:wink:

@hqhq
Copy link
Contributor

hqhq commented Nov 3, 2016

@cyphar Do you mind @yummypeng working with you on this PR and push her commits to your branch?

@cyphar
Copy link
Member Author

cyphar commented Nov 3, 2016

@hqhq No problem. In fact if @yummypeng wants to carry this PR I'm totally fine with that (I've got way too much stuff on my plate recently 😸).

@hqhq
Copy link
Contributor

hqhq commented Feb 10, 2017

Close in favor of #1184 .

@hqhq hqhq closed this Feb 10, 2017
@cyphar cyphar deleted the implement-cgroup-namespace branch February 19, 2017 23:16
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
config-linux: RFC 2119 wording for oomScoreAdj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants