Add support for cgroup namespaces #781

cyphar · 2016-04-26T04:07:06Z

This is a very simple implementation because it doesn't require any
configuration unlike the other namespaces, and in its current state it
only masks paths.

This feature is available in Linux 4.6+ and is enabled by default if
the kernel is compiled with CONFIG_CGROUP=y.

TODO:

Change rootfs_linux setup to actually mount cgroup if we're in a cgroup + user namespace.

Depends on opencontainers/runtime-spec#397

mrunalp · 2016-04-26T16:33:50Z

This looks good. We'll wait for upstream PR to be merged.

jessfraz · 2016-04-26T16:41:41Z

this is the key for the non root containers :)

mrunalp · 2016-04-26T16:43:13Z

@jfrazelle Not really. This doesn't help with non root as we discussed on the dev list. This just namespaces the cgroups path inside the container.

jessfraz · 2016-04-26T16:47:45Z

Ah boooooo, is the goal in linux to eventually make it have more power or
no?

On Tue, Apr 26, 2016 at 9:43 AM, Mrunal Patel [email protected]
wrote:

@jfrazelle https://github.com/jfrazelle Not really. This doesn't help
with non root as we discussed on the dev list. This just namespaces the
cgroups path inside the container.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#781 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

jessfraz · 2016-04-26T16:48:32Z

what a disappointing feature haha

On Tue, Apr 26, 2016 at 9:47 AM, Jessica Frazelle [email protected] wrote:

Ah boooooo, is the goal in linux to eventually make it have more power or
no?

On Tue, Apr 26, 2016 at 9:43 AM, Mrunal Patel [email protected]
wrote:

@jfrazelle https://github.com/jfrazelle Not really. This doesn't help
with non root as we discussed on the dev list. This just namespaces the
cgroups path inside the container.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#781 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu
http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

mrunalp · 2016-04-26T16:50:09Z

@jfrazelle Hopefully it will be more useful in the feature :)

cyphar · 2016-04-26T23:03:52Z

Yeah, tbh I was a bit dissapointed when someone mentioned the new feature to me and I looked it up -- there's so much more it could do. For example, it could fake /proc/{mem,cpu}info so that it contains just information derived from the container limits. And of course there's the "actually administer your own subtree".

[REMOVED OLD PATCH IDEA]

[REMOVED OTHER PATCH IDEA]

I've decided to make it so that unshare(CLONE_NEWCGROUP) will create a subtree in all of the cgroups a task is associated with. Then the calling process will be migrated. The subtree is owned by the current_fs_{u,g}id().

You can see the current state of my kernel patch in my Linux branch.

/cc @jfrazelle @mrunalp

mrunalp · 2016-05-05T22:29:53Z

Yeah, it does make sense from security perspective that container can't change its own cgroups limits but allowing managing subtress is reasonable. Following the upstream converstion @cyphar. Thanks!

This is a very simple implementation because it doesn't require any configuration unlike the other namespaces, and in its current state it only masks paths. This feature is available in Linux 4.6+ and is enabled by default for kernels compiled with CONFIG_CGROUP=y. Signed-off-by: Aleksa Sarai <[email protected]>

yummypeng · 2016-11-03T01:55:40Z

I am also working on adding support for cgroup namespace in runc recently. Now I'm confused that if there is necessity that we allow containers have shared cgroup ns. Since the intention of cgroup ns is to mask cgroup paths, it can somewhat prevent host's information exposed to container.

For example, if we run a container without cgroup ns, then its /proc/{pid}/cgroup would be like this:

root@LFG1000321762:~/tools# runc run without_cgroupns
root@runc:/# cat /proc/self/cgroup
13:name=systemd:/user/0.user/5.session/without_cgroupns
12:pids:/without_cgroupns
11:hugetlb:/user/0.user/5.session/without_cgroupns
10:net_prio:/user/0.user/5.session/without_cgroupns
9:perf_event:/user/0.user/5.session/without_cgroupns
8:net_cls:/user/0.user/5.session/without_cgroupns
7:freezer:/user/0.user/5.session/without_cgroupns
6:devices:/user/0.user/5.session/without_cgroupns
5:memory:/user/0.user/5.session/without_cgroupns
4:blkio:/user/0.user/5.session/without_cgroupns
3:cpuacct:/user/0.user/5.session/without_cgroupns
2:cpu:/user/0.user/5.session/without_cgroupns
1:cpuset:/without_cgroupns

This exposure of cgroup names to the processes running inside a container may result in some problems. With cgroup ns enabled, container's /proc/{pid}/cgroup would be like this:

root@LFG1000321762:~/tools# runc run with_cgroupns
root@runc:/# cat /proc/self/cgroup
13:name=systemd:/
12:pids:/
11:hugetlb:/
10:net_prio:/
9:perf_event:/
8:net_cls:/
7:freezer:/
6:devices:/
5:memory:/
4:blkio:/
3:cpuacct:/
2:cpu:/
1:cpuset:/
root@runc:/# ls -l /proc/self/ns
total 0
lrwxrwxrwx 1 root root 0 Nov  3 01:31 cgroup -> cgroup:[4026532421]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 ipc -> ipc:[4026532364]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 mnt -> mnt:[4026532362]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 net -> net:[4026532367]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 pid -> pid:[4026532365]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Nov  3 01:31 uts -> uts:[4026532363]

Then, if we run another container(in another session) which has shared cgroup ns with the above one, its /proc/{pid}/cgroup would be like this:

root@LFG1000321762:~/tools# runc run shared_cgroupns
root@runc:/# cat /proc/self/cgroup
13:name=systemd:/../../1.session/shared_cgroupns
12:pids:/../shared_cgroupns
11:hugetlb:/../../1.session/shared_cgroupns
10:net_prio:/../../1.session/shared_cgroupns
9:perf_event:/../../1.session/shared_cgroupns
8:net_cls:/../../1.session/shared_cgroupns
7:freezer:/../../1.session/shared_cgroupns
6:devices:/../../1.session/shared_cgroupns
5:memory:/../../1.session/shared_cgroupns
4:blkio:/../../1.session/shared_cgroupns
3:cpuacct:/../../1.session/shared_cgroupns
2:cpu:/../../1.session/shared_cgroupns
1:cpuset:/../shared_cgroupns
root@runc:/# ls -l /proc/self/ns
total 0
lrwxrwxrwx 1 root root 0 Nov  3 01:34 cgroup -> cgroup:[4026532421]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 ipc -> ipc:[4026532305]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 mnt -> mnt:[4026532303]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 net -> net:[4026532308]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 pid -> pid:[4026532306]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Nov  3 01:34 uts -> uts:[4026532304]

We can see that, part of cgroup paths is now exposed to processes running inside this container. Besides, I think paths like /../../1.session/shared_cgroupns are meaningless here.

So I don't think it's necessary to allow containers have shared cgroup ns, it's not secure either.

WDYT @cyphar

cyphar · 2016-11-03T02:01:24Z

@yummypeng If you set cgroupsPath for each container so that they're properly set up, you could make it work properly. Overall there's no reason to disable a kernel feature -- if a user wants to set up containers like that it's up to them.

Also, note that it hasn't exposed the entire cgroup path (just the shared ancestor component), so it's not as big of a deal as you might think. Also as for this:

Besides, I think paths like /../../1.session/shared_cgroupns is meaningless here.

They're not accessible, but they tell you something about what cgroup you're in. Inside a cgroup namespace, all cgroupfs mounts are scoped to the cgroup namespace (so you cannot look at any path outside / inside the namespace). So there is actually legitimate value for this.

However, if you're playing around with cgroup namespaces note that you have to also rewrite some of the rootfs_linux.go code so that it doesn't bindmount cgroups (otherwise you lose the whole benefit of cgroup namespaces). I was planning on doing this in this PR but have been really busy with other things.

yummypeng · 2016-11-03T02:14:42Z

@cyphar Thanks for your suggestion. I would think it twice:wink:

hqhq · 2016-11-03T03:04:05Z

@cyphar Do you mind @yummypeng working with you on this PR and push her commits to your branch?

cyphar · 2016-11-03T04:16:56Z

@hqhq No problem. In fact if @yummypeng wants to carry this PR I'm totally fine with that (I've got way too much stuff on my plate recently 😸).

hqhq · 2017-02-10T07:10:37Z

Close in favor of #1184 .

config-linux: RFC 2119 wording for oomScoreAdj

GordonTheTurtle added the status/0-triage label Apr 26, 2016

cyphar mentioned this pull request Apr 27, 2016

support cgroup v2 (unified hierarchy) #654

Closed

cyphar mentioned this pull request Jun 3, 2016

cgroup namespaces: ignore the mount.Root if we have cgroup namespaces #617

Closed

cyphar mentioned this pull request Jul 26, 2016

fix init.scope in cgroup paths #966

Merged

yummypeng mentioned this pull request Nov 9, 2016

Carry #781: Add support for cgroup namespace #1184

Closed

jasontedor mentioned this pull request Jan 24, 2017

Add hack for obtaining cgroup stats elastic/elasticsearch-docker#25

Merged

hqhq closed this Feb 10, 2017

cyphar deleted the implement-cgroup-namespace branch February 19, 2017 23:16

stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017

Merge pull request opencontainers#781 from wking/oomScoreAdj-rfc-2119

01ec62d

config-linux: RFC 2119 wording for oomScoreAdj

cyphar mentioned this pull request Jan 28, 2018

Add definitions for cgroup namespaces #1707

Closed

crosbymichael mentioned this pull request Oct 22, 2018

Add support for cgroup namespace #1916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for cgroup namespaces #781

Add support for cgroup namespaces #781

cyphar commented Apr 26, 2016 •

edited

Loading

mrunalp commented Apr 26, 2016

jessfraz commented Apr 26, 2016

mrunalp commented Apr 26, 2016

jessfraz commented Apr 26, 2016

jessfraz commented Apr 26, 2016

mrunalp commented Apr 26, 2016

cyphar commented Apr 26, 2016 •

edited

Loading

mrunalp commented May 5, 2016

yummypeng commented Nov 3, 2016 •

edited

Loading

cyphar commented Nov 3, 2016 •

edited

Loading

yummypeng commented Nov 3, 2016

hqhq commented Nov 3, 2016 •

edited

Loading

cyphar commented Nov 3, 2016

hqhq commented Feb 10, 2017

Add support for cgroup namespaces #781

Add support for cgroup namespaces #781

Conversation

cyphar commented Apr 26, 2016 • edited Loading

mrunalp commented Apr 26, 2016

jessfraz commented Apr 26, 2016

mrunalp commented Apr 26, 2016

jessfraz commented Apr 26, 2016

jessfraz commented Apr 26, 2016

mrunalp commented Apr 26, 2016

cyphar commented Apr 26, 2016 • edited Loading

mrunalp commented May 5, 2016

yummypeng commented Nov 3, 2016 • edited Loading

cyphar commented Nov 3, 2016 • edited Loading

yummypeng commented Nov 3, 2016

hqhq commented Nov 3, 2016 • edited Loading

cyphar commented Nov 3, 2016

hqhq commented Feb 10, 2017

cyphar commented Apr 26, 2016 •

edited

Loading

cyphar commented Apr 26, 2016 •

edited

Loading

yummypeng commented Nov 3, 2016 •

edited

Loading

cyphar commented Nov 3, 2016 •

edited

Loading

hqhq commented Nov 3, 2016 •

edited

Loading