Skip to content

Commit

Permalink
Implement none, private, and shareable ipc modes
Browse files Browse the repository at this point in the history
Since the commit d88fe44 ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.

Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).

This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:

 - 'shareable':	enables sharing this container's IPC with others
		(this used to be the implicit default);

 - 'private':	disables sharing this container's IPC.

In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.

While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:

> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...

...so here's yet yet another mode:

 - 'none':	no /dev/shm mount inside the container (though it still
		has its own private IPC namespace).

Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.

Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).

Some other changes this patch introduces are:

1. A mount for /dev/shm is added to default OCI Linux spec.

2. IpcMode.Valid() is simplified to remove duplicated code that parsed
   'container:ID' form. Note the old version used to check that ID does
   not contain a semicolon -- this is no longer the case (tests are
   modified accordingly). The motivation is we should either do a
   proper check for container ID validity, or don't check it at all
   (since it is checked in other places anyway). I chose the latter.

3. IpcMode.Container() is modified to not return container ID if the
   mode value does not start with "container:", unifying the check to
   be the same as in IpcMode.IsContainer().

3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
   to add checks for newly added values.

[v2: addressed review at moby/moby#34087 (review)]
[v3: addressed review at moby/moby#34087 (review)]
[v4: addressed the case of upgrading from older daemon, in this case
     container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]

Signed-off-by: Kir Kolyshkin <[email protected]>
  • Loading branch information
kolyshkin committed Aug 14, 2017
1 parent 0fb1fb1 commit 7120976
Show file tree
Hide file tree
Showing 19 changed files with 246 additions and 112 deletions.
12 changes: 11 additions & 1 deletion api/swagger.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -665,7 +665,17 @@ definitions:
type: "string"
IpcMode:
type: "string"
description: "IPC namespace to use for the container."
description: |
IPC sharing mode for the container. Possible values are:
- `"none"`: own private IPC namespace, with /dev/shm not mounted
- `"private"`: own private IPC namespace
- `"shareable"`: own private IPC namespace, with a possibility to share it with other containers
- `"container:<name|id>"`: join another (shareable) container's IPC namespace
- `"host"`: use the host system's IPC namespace
If not specified, daemon default is used, which can either be `"private"`
or `"shareable"`, depending on daemon version and configuration.
Cgroup:
type: "string"
description: "Cgroup to use for the container."
Expand Down
39 changes: 22 additions & 17 deletions api/types/container/host_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,41 +23,46 @@ func (i Isolation) IsDefault() bool {
// IpcMode represents the container ipc stack.
type IpcMode string

// IsPrivate indicates whether the container uses its private ipc stack.
// IsPrivate indicates whether the container uses its own private ipc namespace which can not be shared.
func (n IpcMode) IsPrivate() bool {
return !(n.IsHost() || n.IsContainer())
return n == "private"
}

// IsHost indicates whether the container uses the host's ipc stack.
// IsHost indicates whether the container shares the host's ipc namespace.
func (n IpcMode) IsHost() bool {
return n == "host"
}

// IsContainer indicates whether the container uses a container's ipc stack.
// IsShareable indicates whether the container's ipc namespace can be shared with another container.
func (n IpcMode) IsShareable() bool {
return n == "shareable"
}

// IsContainer indicates whether the container uses another container's ipc namespace.
func (n IpcMode) IsContainer() bool {
parts := strings.SplitN(string(n), ":", 2)
return len(parts) > 1 && parts[0] == "container"
}

// Valid indicates whether the ipc stack is valid.
// IsNone indicates whether container IpcMode is set to "none".
func (n IpcMode) IsNone() bool {
return n == "none"
}

// IsEmpty indicates whether container IpcMode is empty
func (n IpcMode) IsEmpty() bool {
return n == ""
}

// Valid indicates whether the ipc mode is valid.
func (n IpcMode) Valid() bool {
parts := strings.Split(string(n), ":")
switch mode := parts[0]; mode {
case "", "host":
case "container":
if len(parts) != 2 || parts[1] == "" {
return false
}
default:
return false
}
return true
return n.IsEmpty() || n.IsNone() || n.IsPrivate() || n.IsHost() || n.IsShareable() || n.IsContainer()
}

// Container returns the name of the container ipc stack is going to be used.
func (n IpcMode) Container() string {
parts := strings.SplitN(string(n), ":", 2)
if len(parts) > 1 {
if len(parts) > 1 && parts[0] == "container" {
return parts[1]
}
return ""
Expand Down
1 change: 1 addition & 0 deletions cmd/dockerd/config_unix.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ func installConfigFlags(conf *config.Config, flags *pflag.FlagSet) {
flags.StringVar(&conf.SeccompProfile, "seccomp-profile", "", "Path to seccomp profile")
flags.Var(&conf.ShmSize, "default-shm-size", "Default shm size for containers")
flags.BoolVar(&conf.NoNewPrivileges, "no-new-privileges", false, "Set no-new-privileges by default for new containers")
flags.StringVar(&conf.IpcMode, "default-ipc-mode", config.DefaultIpcMode, `Default mode for containers ipc ("shareable" | "private")`)

attachExperimentalFlags(conf, flags)
}
62 changes: 31 additions & 31 deletions container/container_unix.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ import (
"io/ioutil"
"os"
"path/filepath"
"strings"

"github.com/docker/docker/api/types"
containertypes "github.com/docker/docker/api/types/container"
Expand All @@ -19,6 +18,7 @@ import (
"github.com/docker/docker/pkg/system"
"github.com/docker/docker/volume"
"github.com/opencontainers/selinux/go-selinux/label"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)
Expand Down Expand Up @@ -171,48 +171,48 @@ func (container *Container) HasMountFor(path string) bool {
return exists
}

// UnmountIpcMounts uses the provided unmount function to unmount shm and mqueue if they were mounted
func (container *Container) UnmountIpcMounts(unmount func(pth string) error) {
if container.HostConfig.IpcMode.IsContainer() || container.HostConfig.IpcMode.IsHost() {
return
// UnmountIpcMount uses the provided unmount function to unmount shm if it was mounted
func (container *Container) UnmountIpcMount(unmount func(pth string) error) error {
if container.HasMountFor("/dev/shm") {
return nil
}

var warnings []string

if !container.HasMountFor("/dev/shm") {
shmPath, err := container.ShmResourcePath()
if err != nil {
logrus.Error(err)
warnings = append(warnings, err.Error())
} else if shmPath != "" {
if err := unmount(shmPath); err != nil && !os.IsNotExist(err) {
if mounted, mErr := mount.Mounted(shmPath); mounted || mErr != nil {
warnings = append(warnings, fmt.Sprintf("failed to umount %s: %v", shmPath, err))
}
}

}
// container.ShmPath should not be used here as it may point
// to the host's or other container's /dev/shm
shmPath, err := container.ShmResourcePath()
if err != nil {
return err
}

if len(warnings) > 0 {
logrus.Warnf("failed to cleanup ipc mounts:\n%v", strings.Join(warnings, "\n"))
if shmPath == "" {
return nil
}
if err = unmount(shmPath); err != nil && !os.IsNotExist(err) {
if mounted, mErr := mount.Mounted(shmPath); mounted || mErr != nil {
return errors.Wrapf(err, "umount %s", shmPath)
}
}
return nil
}

// IpcMounts returns the list of IPC mounts
func (container *Container) IpcMounts() []Mount {
var mounts []Mount

if !container.HasMountFor("/dev/shm") {
label.SetFileLabel(container.ShmPath, container.MountLabel)
mounts = append(mounts, Mount{
Source: container.ShmPath,
Destination: "/dev/shm",
Writable: true,
Propagation: string(volume.DefaultPropagationMode),
})
if container.HasMountFor("/dev/shm") {
return mounts
}
if container.ShmPath == "" {
return mounts
}

label.SetFileLabel(container.ShmPath, container.MountLabel)
mounts = append(mounts, Mount{
Source: container.ShmPath,
Destination: "/dev/shm",
Writable: true,
Propagation: string(volume.DefaultPropagationMode),
})

return mounts
}

Expand Down
5 changes: 3 additions & 2 deletions container/container_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,10 @@ type ExitStatus struct {
ExitCode int
}

// UnmountIpcMounts unmounts Ipc related mounts.
// UnmountIpcMount unmounts Ipc related mounts.
// This is a NOOP on windows.
func (container *Container) UnmountIpcMounts(unmount func(pth string) error) {
func (container *Container) UnmountIpcMount(unmount func(pth string) error) error {
return nil
}

// IpcMounts returns the list of Ipc related mounts.
Expand Down
5 changes: 5 additions & 0 deletions daemon/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -513,6 +513,11 @@ func Validate(config *Config) error {
}
}

// validate platform-specific settings
if err := config.ValidatePlatformConfig(); err != nil {
return err
}

return nil
}

Expand Down
5 changes: 5 additions & 0 deletions daemon/config/config_solaris.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,8 @@ type BridgeConfig struct {
func (conf *Config) IsSwarmCompatible() error {
return nil
}

// ValidatePlatformConfig checks if any platform-specific configuration settings are invalid.
func (conf *Config) ValidatePlatformConfig() error {
return nil
}
25 changes: 25 additions & 0 deletions daemon/config/config_unix.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,16 @@ package config
import (
"fmt"

containertypes "github.com/docker/docker/api/types/container"
"github.com/docker/docker/opts"
units "github.com/docker/go-units"
)

const (
// DefaultIpcMode is default for container's IpcMode, if not set otherwise
DefaultIpcMode = "shareable" // TODO: change to private
)

// Config defines the configuration of a docker daemon.
// It includes json tags to deserialize configuration from a file
// using the same names that the flags in the command line uses.
Expand All @@ -31,6 +37,7 @@ type Config struct {
SeccompProfile string `json:"seccomp-profile,omitempty"`
ShmSize opts.MemBytes `json:"default-shm-size,omitempty"`
NoNewPrivileges bool `json:"no-new-privileges,omitempty"`
IpcMode string `json:"default-ipc-mode,omitempty"`
}

// BridgeConfig stores all the bridge driver specific
Expand Down Expand Up @@ -61,3 +68,21 @@ func (conf *Config) IsSwarmCompatible() error {
}
return nil
}

func verifyDefaultIpcMode(mode string) error {
const hint = "Use \"shareable\" or \"private\"."

dm := containertypes.IpcMode(mode)
if !dm.Valid() {
return fmt.Errorf("Default IPC mode setting (%v) is invalid. "+hint, dm)
}
if dm != "" && !dm.IsPrivate() && !dm.IsShareable() {
return fmt.Errorf("IPC mode \"%v\" is not supported as default value. "+hint, dm)
}
return nil
}

// ValidatePlatformConfig checks if any platform-specific configuration settings are invalid.
func (conf *Config) ValidatePlatformConfig() error {
return verifyDefaultIpcMode(conf.IpcMode)
}
5 changes: 5 additions & 0 deletions daemon/config/config_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,8 @@ func (conf *Config) GetExecRoot() string {
func (conf *Config) IsSwarmCompatible() error {
return nil
}

// ValidatePlatformConfig checks if any platform-specific configuration settings are invalid.
func (conf *Config) ValidatePlatformConfig() error {
return nil
}
55 changes: 40 additions & 15 deletions daemon/container_operations_unix.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,27 @@ func (daemon *Daemon) setupLinkedContainers(container *container.Container) ([]s
return env, nil
}

func (daemon *Daemon) getIpcContainer(container *container.Container) (*container.Container, error) {
containerID := container.HostConfig.IpcMode.Container()
container, err := daemon.GetContainer(containerID)
func (daemon *Daemon) getIpcContainer(id string) (*container.Container, error) {
errMsg := "can't join IPC of container " + id
// Check the container exists
container, err := daemon.GetContainer(id)
if err != nil {
return nil, errors.Wrapf(err, "cannot join IPC of a non running container: %s", container.ID)
return nil, errors.Wrap(err, errMsg)
}
return container, daemon.checkContainer(container, containerIsRunning, containerIsNotRestarting)
// Check the container is running and not restarting
if err := daemon.checkContainer(container, containerIsRunning, containerIsNotRestarting); err != nil {
return nil, errors.Wrap(err, errMsg)
}
// Check the container ipc is shareable
if st, err := os.Stat(container.ShmPath); err != nil || !st.IsDir() {
if err == nil || os.IsNotExist(err) {
return nil, errors.New(errMsg + ": non-shareable IPC")
}
// stat() failed?
return nil, errors.Wrap(err, errMsg+": unexpected error from stat "+container.ShmPath)
}

return container, nil
}

func (daemon *Daemon) getPidContainer(container *container.Container) (*container.Container, error) {
Expand All @@ -90,25 +104,33 @@ func containerIsNotRestarting(c *container.Container) error {
}

func (daemon *Daemon) setupIpcDirs(c *container.Container) error {
var err error
ipcMode := c.HostConfig.IpcMode

c.ShmPath, err = c.ShmResourcePath()
if err != nil {
return err
}

if c.HostConfig.IpcMode.IsContainer() {
ic, err := daemon.getIpcContainer(c)
switch {
case ipcMode.IsContainer():
ic, err := daemon.getIpcContainer(ipcMode.Container())
if err != nil {
return err
}
c.ShmPath = ic.ShmPath
} else if c.HostConfig.IpcMode.IsHost() {

case ipcMode.IsHost():
if _, err := os.Stat("/dev/shm"); err != nil {
return fmt.Errorf("/dev/shm is not mounted, but must be for --ipc=host")
}
c.ShmPath = "/dev/shm"
} else {

case ipcMode.IsPrivate(), ipcMode.IsNone():
// c.ShmPath will/should not be used, so make it empty.
// Container's /dev/shm mount comes from OCI spec.
c.ShmPath = ""

case ipcMode.IsEmpty():
// A container was created by an older version of the daemon.
// The default behavior used to be what is now called "shareable".
fallthrough

case ipcMode.IsShareable():
rootIDs := daemon.idMappings.RootPair()
if !c.HasMountFor("/dev/shm") {
shmPath, err := c.ShmResourcePath()
Expand All @@ -127,8 +149,11 @@ func (daemon *Daemon) setupIpcDirs(c *container.Container) error {
if err := os.Chown(shmPath, rootIDs.UID, rootIDs.GID); err != nil {
return err
}
c.ShmPath = shmPath
}

default:
return fmt.Errorf("invalid IPC mode: %v", ipcMode)
}

return nil
Expand Down
3 changes: 2 additions & 1 deletion daemon/daemon_solaris.go
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,8 @@ func verifyPlatformContainerSettings(daemon *Daemon, hostConfig *containertypes.

// reloadPlatform updates configuration with platform specific options
// and updates the passed attributes
func (daemon *Daemon) reloadPlatform(conf *config.Config, attributes map[string]string) {
func (daemon *Daemon) reloadPlatform(conf *config.Config, attributes map[string]string) error {
return nil
}

// verifyDaemonSettings performs validation of daemon config struct
Expand Down
Loading

0 comments on commit 7120976

Please sign in to comment.