Unlike its stdlib counterparts, AtomicFileWriter does not take into consideration umask due to its use of chmod. Failure to recognize this may cause subtle problems like the one described in #47498.
Therefore the documentation has been updated to let users know that umask is not taken into consideration when using AtomicFileWriter.
Closes #47516.
Signed-off-by: Antonio Aguilar <antonio@zoftko.com>
Disable IPv6 for endpoints in '--ipv6=false' networks.
No IPAM IPv6 address is given to an interface in a network with '--ipv6=false', but the kernel would assign a link-local address and, in a macvlan/ipvlan network, the interface may get a SLAAC-assigned address.
daemon: add nolint-comments for deprecated kernel-memory options, hooks
This adds some nolint-comments for the deprecated kernel-memory options; we deprecated these, but they could technically still be accepted by alternative runtimes.
daemon/daemon_unix.go:108:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck) memory.Kernel = &config.KernelMemory ^ daemon/update_linux.go:63:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck) memory.Kernel = &resources.KernelMemory ^
Prestart hooks are deprecated, and more granular hooks should be used instead. CreateRuntime are the closest equivalent, and executed in the same locations as Prestart-hooks, but depending on what these hooks do, possibly one of the other hooks could be used instead (such as CreateContainer or StartContainer). As these hooks are still supported, this patch adds nolint comments, but adds some TODOs to consider migrating to something else;
daemon/nvidia_linux.go:86:2: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck) s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ ^
daemon/oci_linux.go:76:5: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck) s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ ^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- config: add idmap and ridmap mount options - config.md: allow empty mappings for [r]idmap - features-linux: Expose idmap information - mount: Allow relative mount destinations on Linux - features: add potentiallyUnsafeConfigAnnotations - config: add support for org.opencontainers.image annotations
Minor fixes:
- config: improve bind mount and propagation doc
full diff: https://github.com/opencontainers/runtime-spec/compare/v1.1.0...v1.2.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the internal DNS resolver for Windows containers forward requests to upsteam DNS servers when it cannot respond itself, rather than returning SERVFAIL.
Windows containers are normally configured with the internal resolver first for service discovery (container name lookup), then external resolvers from '--dns' or the host's networking configuration.
When a tool like ping gets a SERVFAIL from the internal resolver, it tries the other nameservers. But, nslookup does not, and with this change it does not need to.
The internal resolver learns external server addresses from the container's HNSEndpoint configuration, so it will use the same DNS servers as processes in the container.
The internal resolver for Windows containers listens on the network's gateway address, and each container may have a different set of external DNS servers. So, the resolver uses the source address of the DNS request to select external resolvers.
On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used to prevent the internal resolver from forwarding requests (restoring the old behaviour).
The `normalizeWorkdir` function has two branches, one that returns a result of `filepath.Join` which always returns a cleaned path, and another one where the input string is returned unmodified.
To make these two outputs consistent, also clean the path in the second branch.
This also makes the cleaning of the container workdir explicit in the `normalizeWorkdir` function instead of relying on the `SetupWorkingDirectory` to mutate it.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This error is returned when attempting to walk a descriptor that *should* be an index or a manifest. Without this the error is not very helpful sicne there's no way to tell what triggered it.
full diff: https://github.com/golang/mod/compare/v0.13.0...v0.17.0
- modfile: do not collapse if there are unattached comments within blocks - modfile: fix crash on AddGoStmt in empty File - modfile: improve directory path detection and error text consistency - modfile: use new go version string format in WorkFile.add error - sumdb: replace globsMatchPath with module.MatchPrefixPatterns - sumdb/tlog: make NewTiles only generate strictly necessary tiles
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If a container is configured with the internal DNS resolver's own address as an external server, try the next ext server rather than recursing (return SERVFAIL if there are no other servers).
/usr/sbin/runc is confined with "runc" profile[1] introduced in AppArmor v4.0.0. This change breaks stopping of containers, because the profile assigned to containers doesn't accept signals from the "runc" peer. AppArmor >= v4.0.0 is currently part of Ubuntu Mantic (23.10) and later.
In the case of Docker, this regression is hidden by the fact that dockerd itself sends SIGKILL to the running container after runc fails to stop it. It is still a regression, because graceful shutdowns of containers via "docker stop" are no longer possible, as SIGTERM from runc is not delivered to them. This can be seen in logs from dockerd when run with debug logging enabled and also from tracing signals with killsnoop utility from bcc[2] (in bpfcc-tools package in Debian/Ubuntu):
Test commands:
root@cloudimg:~# docker run -d --name test redis ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46 root@cloudimg:~# docker stop test
Relevant syslog messages (with wrapped long lines):
root@cloudimg:~# killsnoop-bpfcc TIME PID COMM SIG TPID RESULT 20:51:00 9631 runc 3 9581 -13 20:51:02 9637 runc 9 9581 -13 20:51:12 9030 dockerd 9 9581 0
This change extends the docker-default profile with rules that allow receiving signals from processes that run confined with either runc or crun profile (crun[4] is an alternative OCI runtime that's also confined in AppArmor >= v4.0.0, see [1]). It is backward compatible because the peer value is a regular expression (AARE) so the referenced profile doesn't have to exist for this profile to successfully compile and load.
Note that the runc profile has an attachment to /usr/sbin/runc. This is the path where the runc package in Debian/Ubuntu puts the binary. When the docker-ce package is installed from the upstream repository[3], runc is installed as part of the containerd.io package at /usr/bin/runc. Therefore it's still running unconfined and has no issues sending signals to containers.
`addrSpace` methods are currently scattered in two different files. As upcoming work will rewrite some of these methods, better put them into a separate file.
Address spaces are a continuum of addresses that can be used for a specific purpose (ie. 'local' for unmanaged containers, 'global for Swarm). v4 and v6 addresses aren't of the same size -- hence combining them into a single address space doesn't form a continuum. Better set them apart into two different address spaces.
Also, the upcoming rewrite of `addrSpace` will benefit from that split.
libnet/ipamapi: add in/out structs for RequestPool
The `RequestPool` method has many args and named returns. This makes the code hard to follow at times. This commit adds one struct, `PoolRequest`, to replace these args, and one struct, `AllocatedPool`, to replace these named returns.
Both structs' fields are properly documented to better define their semantics, and their relationship with address allocation.
libnet/ipam: default-address-pools as Register arg
Prior to this change, daemon's `default-address-pools` param would be passed to `SetDefaultIPAddressPool()` to set a global var named `defaultAddressPool`. This var would then be retrieved during the `default` IPAM driver registration. Both steps were executed in close succession during libnet's controller initialization.
This change removes the global var and just pass the user-defined `default-address-pools` to the `default` driver's `Register` fn.
All drivers except the default ipam driver are stored in ipams/. Since `default` isn't a valid Go pkg name, this package is renamed to `defaultipam`, following `windowsipam` example.
All drivers except the default have a Register function. Before this change, default's registration was handled by another package. Move this logic into the driver pkg.
Prior to this change, cnmallocator would call `ConfigGlobalScopeDefaultNetworks` right before initializing its IPAM drivers. This function was mutating some global state used during drivers init.
This change just remove the global state, and adds an arg to ipams.Register and defaultipam.Register to pass the global pools by arguments instead.
If dockerd runs on a host with a read-only /proc/sys/net filesystem, it isn't able to enable or disable IPv6 on network interfaces when attaching a container to a network (including initial networks during container creation).
In release 26.0.2, a read-only /proc/sys/net meant container creation failed in all cases.
So, don't attempt to enable/disable IPv6 on an interface if it's already set appropriately.
If it's not possible to enable IPv6 when it's needed, just log (because that's what libnetwork has always done if IPv6 is disabled in the kernel).
If it's not possible to disable IPv6 when it needs to be disabled, refuse to create the container and raise an error that suggests setting environment variable "DOCKER_ALLOW_IPV6_ON_IPV4_INTERFACE=1", to tell the daemon it's ok to ignore the problem.
Make the behaviour enabled by env var DOCKER_BRIDGE_PRESERVE_KERNEL_LL the default... - don't remove kernel assigned link-local addresses - or any address in fe80::/64 - don't assign fe80::1 to a bridge
Don't delete IPv6 multicast addresses from a bridge
Multicast addresses aren't added by the daemon so, if they're present, it's because they were explicitly added - possibly to a user-managed bridge. So, don't remove.
The test hadn't been running, because it used testRequires(c, IPv6) and predicate "IPv6" returns the opposite of the expected result.
If the test had run, it'd have failed because: - it used "--listen-add", but the option is "--listen-addr" - so, the daemon wouldn't have started - it tried to use "--join ::1" - address "::1" was interpreted as host:port so the Dial() failed, it needed to be "[::1]". - it didn't supply a join token
The test hadn't been running, because it used testRequires(c, IPv6) and predicate "IPv6" returns the opposite of the expected result.
TestDaemonIPv6Enabled tried to run with IPv6 on the default bridge, but didn't set up a "fixed-cidr-v6" - so the daemon wouldn't start.
It then tried to check the bridge had address "fe80::1", which it expected to work because it had just used setupV6() to add that address.
Then it checked that "LinkLocalIPv6Address" was set in container inspect output, but it wouldn't be (the field is deprecated).
There are working IPv6 tests in the suite (TestDaemonIPv6FixedCIDR, TestDaemonIPv6FixedCIDRAndMac, TestDaemonIPv6HostMode) - and there's more coverage in the network integration tests.
So, deleted the test as it didn't seem worth salvaging.
Also deleted now-unused helper functions setupV6(), teardownV6().
- https://github.com/golang/go/issues?q=milestone%3AGo1.21.10+label%3ACherryPickApproved - full diff: https://github.com/golang/go/compare/go1.21.9...go1.21.10
These minor releases include 2 security fixes following the security policy:
- cmd/go: arbitrary code execution during build on darwin On Darwin, building a Go module which contains CGO can trigger arbitrary code execution when using the Apple version of ld, due to usage of the -lto_library flag in a "#cgo LDFLAGS" directive. Thanks to Juho Forsén of Mattermost for reporting this issue. This is CVE-2024-24787 and Go issue https://go.dev/issue/67119.
- net: malformed DNS message can cause infinite loop A malformed DNS message in response to a query can cause the Lookup functions to get stuck in an infinite loop. Thanks to long-name-let-people-remember-you on GitHub for reporting this issue, and to Mateusz Poliwczak for bringing the issue to our attention. This is CVE-2024-24788 and Go issue https://go.dev/issue/66754.
View the release notes for more information: https://go.dev/doc/devel/release#go1.22.3
**- Description for the changelog**
```markdown changelog Update Go runtime to 1.21.10 ```
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Since commit befff0e1, `(*Controller).getStore()` never returns nil except if `c.store` isn't initialized yet. This can't happen unless `New()` returned an error and it wasn't proper caught.
Unlike its stdlib counterparts, AtomicFileWriter does not take into consideration umask due to its use of chmod. Failure to recognize this may cause subtle problems like the one described in #47498.
Therefore the documentation has been updated to let users know that umask is not taken into consideration when using AtomicFileWriter.
Closes #47516.
Signed-off-by: Antonio Aguilar <antonio@zoftko.com>
Disable IPv6 for endpoints in '--ipv6=false' networks.
No IPAM IPv6 address is given to an interface in a network with '--ipv6=false', but the kernel would assign a link-local address and, in a macvlan/ipvlan network, the interface may get a SLAAC-assigned address.
daemon: add nolint-comments for deprecated kernel-memory options, hooks
This adds some nolint-comments for the deprecated kernel-memory options; we deprecated these, but they could technically still be accepted by alternative runtimes.
daemon/daemon_unix.go:108:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck) memory.Kernel = &config.KernelMemory ^ daemon/update_linux.go:63:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck) memory.Kernel = &resources.KernelMemory ^
Prestart hooks are deprecated, and more granular hooks should be used instead. CreateRuntime are the closest equivalent, and executed in the same locations as Prestart-hooks, but depending on what these hooks do, possibly one of the other hooks could be used instead (such as CreateContainer or StartContainer). As these hooks are still supported, this patch adds nolint comments, but adds some TODOs to consider migrating to something else;
daemon/nvidia_linux.go:86:2: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck) s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ ^
daemon/oci_linux.go:76:5: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck) s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ ^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- config: add idmap and ridmap mount options - config.md: allow empty mappings for [r]idmap - features-linux: Expose idmap information - mount: Allow relative mount destinations on Linux - features: add potentiallyUnsafeConfigAnnotations - config: add support for org.opencontainers.image annotations
Minor fixes:
- config: improve bind mount and propagation doc
full diff: https://github.com/opencontainers/runtime-spec/compare/v1.1.0...v1.2.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the internal DNS resolver for Windows containers forward requests to upsteam DNS servers when it cannot respond itself, rather than returning SERVFAIL.
Windows containers are normally configured with the internal resolver first for service discovery (container name lookup), then external resolvers from '--dns' or the host's networking configuration.
When a tool like ping gets a SERVFAIL from the internal resolver, it tries the other nameservers. But, nslookup does not, and with this change it does not need to.
The internal resolver learns external server addresses from the container's HNSEndpoint configuration, so it will use the same DNS servers as processes in the container.
The internal resolver for Windows containers listens on the network's gateway address, and each container may have a different set of external DNS servers. So, the resolver uses the source address of the DNS request to select external resolvers.
On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used to prevent the internal resolver from forwarding requests (restoring the old behaviour).
The `normalizeWorkdir` function has two branches, one that returns a result of `filepath.Join` which always returns a cleaned path, and another one where the input string is returned unmodified.
To make these two outputs consistent, also clean the path in the second branch.
This also makes the cleaning of the container workdir explicit in the `normalizeWorkdir` function instead of relying on the `SetupWorkingDirectory` to mutate it.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This error is returned when attempting to walk a descriptor that *should* be an index or a manifest. Without this the error is not very helpful sicne there's no way to tell what triggered it.
full diff: https://github.com/golang/mod/compare/v0.13.0...v0.17.0
- modfile: do not collapse if there are unattached comments within blocks - modfile: fix crash on AddGoStmt in empty File - modfile: improve directory path detection and error text consistency - modfile: use new go version string format in WorkFile.add error - sumdb: replace globsMatchPath with module.MatchPrefixPatterns - sumdb/tlog: make NewTiles only generate strictly necessary tiles
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If a container is configured with the internal DNS resolver's own address as an external server, try the next ext server rather than recursing (return SERVFAIL if there are no other servers).
/usr/sbin/runc is confined with "runc" profile[1] introduced in AppArmor v4.0.0. This change breaks stopping of containers, because the profile assigned to containers doesn't accept signals from the "runc" peer. AppArmor >= v4.0.0 is currently part of Ubuntu Mantic (23.10) and later.
In the case of Docker, this regression is hidden by the fact that dockerd itself sends SIGKILL to the running container after runc fails to stop it. It is still a regression, because graceful shutdowns of containers via "docker stop" are no longer possible, as SIGTERM from runc is not delivered to them. This can be seen in logs from dockerd when run with debug logging enabled and also from tracing signals with killsnoop utility from bcc[2] (in bpfcc-tools package in Debian/Ubuntu):
Test commands:
root@cloudimg:~# docker run -d --name test redis ba04c137827df8468358c274bc719bf7fc291b1ed9acf4aaa128ccc52816fe46 root@cloudimg:~# docker stop test
Relevant syslog messages (with wrapped long lines):
root@cloudimg:~# killsnoop-bpfcc TIME PID COMM SIG TPID RESULT 20:51:00 9631 runc 3 9581 -13 20:51:02 9637 runc 9 9581 -13 20:51:12 9030 dockerd 9 9581 0
This change extends the docker-default profile with rules that allow receiving signals from processes that run confined with either runc or crun profile (crun[4] is an alternative OCI runtime that's also confined in AppArmor >= v4.0.0, see [1]). It is backward compatible because the peer value is a regular expression (AARE) so the referenced profile doesn't have to exist for this profile to successfully compile and load.
Note that the runc profile has an attachment to /usr/sbin/runc. This is the path where the runc package in Debian/Ubuntu puts the binary. When the docker-ce package is installed from the upstream repository[3], runc is installed as part of the containerd.io package at /usr/bin/runc. Therefore it's still running unconfined and has no issues sending signals to containers.
`addrSpace` methods are currently scattered in two different files. As upcoming work will rewrite some of these methods, better put them into a separate file.
Address spaces are a continuum of addresses that can be used for a specific purpose (ie. 'local' for unmanaged containers, 'global for Swarm). v4 and v6 addresses aren't of the same size -- hence combining them into a single address space doesn't form a continuum. Better set them apart into two different address spaces.
Also, the upcoming rewrite of `addrSpace` will benefit from that split.
libnet/ipamapi: add in/out structs for RequestPool
The `RequestPool` method has many args and named returns. This makes the code hard to follow at times. This commit adds one struct, `PoolRequest`, to replace these args, and one struct, `AllocatedPool`, to replace these named returns.
Both structs' fields are properly documented to better define their semantics, and their relationship with address allocation.
libnet/ipam: default-address-pools as Register arg
Prior to this change, daemon's `default-address-pools` param would be passed to `SetDefaultIPAddressPool()` to set a global var named `defaultAddressPool`. This var would then be retrieved during the `default` IPAM driver registration. Both steps were executed in close succession during libnet's controller initialization.
This change removes the global var and just pass the user-defined `default-address-pools` to the `default` driver's `Register` fn.
All drivers except the default ipam driver are stored in ipams/. Since `default` isn't a valid Go pkg name, this package is renamed to `defaultipam`, following `windowsipam` example.
All drivers except the default have a Register function. Before this change, default's registration was handled by another package. Move this logic into the driver pkg.
Prior to this change, cnmallocator would call `ConfigGlobalScopeDefaultNetworks` right before initializing its IPAM drivers. This function was mutating some global state used during drivers init.
This change just remove the global state, and adds an arg to ipams.Register and defaultipam.Register to pass the global pools by arguments instead.
If dockerd runs on a host with a read-only /proc/sys/net filesystem, it isn't able to enable or disable IPv6 on network interfaces when attaching a container to a network (including initial networks during container creation).
In release 26.0.2, a read-only /proc/sys/net meant container creation failed in all cases.
So, don't attempt to enable/disable IPv6 on an interface if it's already set appropriately.
If it's not possible to enable IPv6 when it's needed, just log (because that's what libnetwork has always done if IPv6 is disabled in the kernel).
If it's not possible to disable IPv6 when it needs to be disabled, refuse to create the container and raise an error that suggests setting environment variable "DOCKER_ALLOW_IPV6_ON_IPV4_INTERFACE=1", to tell the daemon it's ok to ignore the problem.
Make the behaviour enabled by env var DOCKER_BRIDGE_PRESERVE_KERNEL_LL the default... - don't remove kernel assigned link-local addresses - or any address in fe80::/64 - don't assign fe80::1 to a bridge
Don't delete IPv6 multicast addresses from a bridge
Multicast addresses aren't added by the daemon so, if they're present, it's because they were explicitly added - possibly to a user-managed bridge. So, don't remove.
The test hadn't been running, because it used testRequires(c, IPv6) and predicate "IPv6" returns the opposite of the expected result.
If the test had run, it'd have failed because: - it used "--listen-add", but the option is "--listen-addr" - so, the daemon wouldn't have started - it tried to use "--join ::1" - address "::1" was interpreted as host:port so the Dial() failed, it needed to be "[::1]". - it didn't supply a join token
The test hadn't been running, because it used testRequires(c, IPv6) and predicate "IPv6" returns the opposite of the expected result.
TestDaemonIPv6Enabled tried to run with IPv6 on the default bridge, but didn't set up a "fixed-cidr-v6" - so the daemon wouldn't start.
It then tried to check the bridge had address "fe80::1", which it expected to work because it had just used setupV6() to add that address.
Then it checked that "LinkLocalIPv6Address" was set in container inspect output, but it wouldn't be (the field is deprecated).
There are working IPv6 tests in the suite (TestDaemonIPv6FixedCIDR, TestDaemonIPv6FixedCIDRAndMac, TestDaemonIPv6HostMode) - and there's more coverage in the network integration tests.
So, deleted the test as it didn't seem worth salvaging.
Also deleted now-unused helper functions setupV6(), teardownV6().
- https://github.com/golang/go/issues?q=milestone%3AGo1.21.10+label%3ACherryPickApproved - full diff: https://github.com/golang/go/compare/go1.21.9...go1.21.10
These minor releases include 2 security fixes following the security policy:
- cmd/go: arbitrary code execution during build on darwin On Darwin, building a Go module which contains CGO can trigger arbitrary code execution when using the Apple version of ld, due to usage of the -lto_library flag in a "#cgo LDFLAGS" directive. Thanks to Juho Forsén of Mattermost for reporting this issue. This is CVE-2024-24787 and Go issue https://go.dev/issue/67119.
- net: malformed DNS message can cause infinite loop A malformed DNS message in response to a query can cause the Lookup functions to get stuck in an infinite loop. Thanks to long-name-let-people-remember-you on GitHub for reporting this issue, and to Mateusz Poliwczak for bringing the issue to our attention. This is CVE-2024-24788 and Go issue https://go.dev/issue/66754.
View the release notes for more information: https://go.dev/doc/devel/release#go1.22.3
**- Description for the changelog**
```markdown changelog Update Go runtime to 1.21.10 ```
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Since commit befff0e1, `(*Controller).getStore()` never returns nil except if `c.store` isn't initialized yet. This can't happen unless `New()` returned an error and it wasn't proper caught.