Running SetKey to store the OCI Sandbox key after task creation, rather than from the OCI prestart hook, meant it happened after sysctl settings were applied by the runtime - which was the intention, we wanted to complete Sandbox configuration after IPv6 had been disabled by a sysctl if that was going to happen.
But, it meant '--sysctl' options for a specfic network interface caused container task creation to fail, because the interface is only moved into the network namespace during SetKey.
This change restores the SetKey prestart hook, and regenerates config files that depend on the container's support for IPv6 after the task has been created. It also adds a regression test that makes sure it's possible to set an interface-specfic sysctl.
Running SetKey to store the OCI Sandbox key after task creation, rather than from the OCI prestart hook, meant it happened after sysctl settings were applied by the runtime - which was the intention, we wanted to complete Sandbox configuration after IPv6 had been disabled by a sysctl if that was going to happen.
But, it meant '--sysctl' options for a specfic network interface caused container task creation to fail, because the interface is only moved into the network namespace during SetKey.
This change restores the SetKey prestart hook, and regenerates config files that depend on the container's support for IPv6 after the task has been created. It also adds a regression test that makes sure it's possible to set an interface-specfic sysctl.
The NetworkMode "default" is now normalized into the value it aliases ("bridge" on Linux and "nat" on Windows) by the ContainerCreate endpoint, the legacy image builder, Swarm's cluster executor and by the container restore codepath.
builder-next is left untouched as it already uses the normalized value (ie. bridge).
Going forward, this will make maintenance easier as there's one less NetworkMode to care about.
This was brought up by bmitch that its not expected to have a platform object in the config descriptor. Also checked with tianon who agreed, its not _wrong_ but is unexpected and doesn't neccessarily make sense to have it there.
Also, while technically incorrect, ECR is throwing an error when it sees this.
- https://github.com/golang/net/compare/v0.18.0...v0.22.0 - websocket: add support for dialing with context - http2: remove suspicious uint32->v conversion in frame code - http2: send an error of FLOW_CONTROL_ERROR when exceed the maximum octets - https://github.com/golang/crypto/compare/v0.17.0...v0.21.0 - internal/poly1305: drop Go 1.12 compatibility - internal/poly1305: improve sum_ppc64le.s - ocsp: don't use iota for externally defined constants
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: https://github.com/golang/net/compare/v0.22.0...v0.23.0
Includes a fix for CVE-2023-45288, which is also addressed in go1.22.2 and go1.21.9;
> http2: close connections when receiving too many headers > > Maintaining HPACK state requires that we parse and process > all HEADERS and CONTINUATION frames on a connection. > When a request's headers exceed MaxHeaderBytes, we don't > allocate memory to store the excess headers but we do > parse them. This permits an attacker to cause an HTTP/2 > endpoint to read arbitrary amounts of data, all associated > with a request which is going to be rejected. > > Set a limit on the amount of excess header frames we > will process before closing a connection. > > Thanks to Bartek Nowotarski for reporting this issue.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
go1.21.9 (released 2024-04-03) includes a security fix to the net/http package, as well as bug fixes to the linker, and the go/types and net/http packages. See the [Go 1.21.9 milestone](https://github.com/golang/go/issues?q=milestone%3AGo1.21.9+label%3ACherryPickApproved) for more details.
These minor releases include 1 security fixes following the security policy:
- http2: close connections when receiving too many headers
Maintaining HPACK state requires that we parse and process all HEADERS and CONTINUATION frames on a connection. When a request's headers exceed MaxHeaderBytes, we don't allocate memory to store the excess headers but we do parse them. This permits an attacker to cause an HTTP/2 endpoint to read arbitrary amounts of header data, all associated with a request which is going to be rejected. These headers can include Huffman-encoded data which is significantly more expensive for the receiver to decode than for an attacker to send.
Set a limit on the amount of excess header frames we will process before closing a connection.
Thanks to Bartek Nowotarski (https://nowotarski.info/) for reporting this issue.
This is CVE-2023-45288 and Go issue https://go.dev/issue/65051.
View the release notes for more information: https://go.dev/doc/devel/release#go1.22.2
- https://github.com/golang/go/issues?q=milestone%3AGo1.21.9+label%3ACherryPickApproved - full diff: https://github.com/golang/go/compare/go1.21.8...go1.21.9
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The internal DNS resolver should only forward requests to external resolvers if the libnetwork.Sandbox served by the resolver has external network access (so, no forwarding for '--internal' networks).
The test for external network access was whether the Sandbox had an Endpoint with a gateway configured.
However, an ipvlan-l3 networks with external network access does not have a gateway, it has a default route bound to an interface.
Also, we document that an ipvlan network with no parent interface is equivalent to a '--internal' network. But, in this case, an ipvlan-l2 network was configured with a gateway. So, DNS proxying would be enabled in the internal resolver (and, if the host's resolver was on a localhost address, requests to external resolvers from the host's network namespace would succeed).
So, this change adjusts the test for enabling DNS proxying to include a check for '--internal' (as a shortcut) and, for non-internal networks, checks for a default route as well as a gateway. It also disables configuration of a gateway or a default route for an ipvlan Endpoint if no parent interface is specified.
(Note if a parent interface with no external network is supplied as '-o parent=<dummy>', the gateway/default route will still be set up and external DNS proxying will be enabled. The network must be configured as '--internal' to prevent that from happening.)
We document that an macvlan network with no parent interface is equivalent to a '--internal' network. But, in this case, an macvlan network was still configured with a gateway. So, DNS proxying would be enabled in the internal resolver (and, if the host's resolver was on a localhost address, requests to external resolvers from the host's network namespace would succeed).
This change disables configuration of a gateway for a macvlan Endpoint if no parent interface is specified.
(Note if a parent interface with no external network is supplied as '-o parent=<dummy>', the gateway will still be set up. Documentation will need to be updated to note that '--internal' should be used to prevent DNS request forwarding in this case.)
This reverts commit a77e147d322c153ae1c2ae0ee45c1835c109e231.
The ipvlan integration tests have been skipped in CI because of a check intended to ensure the kernel has ipvlan support - which failed, but seems to be unnecessary (probably because kernels have moved on).
Disable IPv6 for endpoints in '--ipv6=false' networks.
No IPAM IPv6 address is given to an interface in a network with '--ipv6=false', but the kernel would assign a link-local address and, in a macvlan/ipvlan network, the interface may get a SLAAC-assigned address.
daemon: add nolint-comments for deprecated kernel-memory options, hooks
This adds some nolint-comments for the deprecated kernel-memory options; we deprecated these, but they could technically still be accepted by alternative runtimes.
daemon/daemon_unix.go:108:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck) memory.Kernel = &config.KernelMemory ^ daemon/update_linux.go:63:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck) memory.Kernel = &resources.KernelMemory ^
Prestart hooks are deprecated, and more granular hooks should be used instead. CreateRuntime are the closest equivalent, and executed in the same locations as Prestart-hooks, but depending on what these hooks do, possibly one of the other hooks could be used instead (such as CreateContainer or StartContainer). As these hooks are still supported, this patch adds nolint comments, but adds some TODOs to consider migrating to something else;
daemon/nvidia_linux.go:86:2: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck) s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ ^
daemon/oci_linux.go:76:5: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck) s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ ^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- config: add idmap and ridmap mount options - config.md: allow empty mappings for [r]idmap - features-linux: Expose idmap information - mount: Allow relative mount destinations on Linux - features: add potentiallyUnsafeConfigAnnotations - config: add support for org.opencontainers.image annotations
Minor fixes:
- config: improve bind mount and propagation doc
full diff: https://github.com/opencontainers/runtime-spec/compare/v1.1.0...v1.2.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the internal DNS resolver for Windows containers forward requests to upsteam DNS servers when it cannot respond itself, rather than returning SERVFAIL.
Windows containers are normally configured with the internal resolver first for service discovery (container name lookup), then external resolvers from '--dns' or the host's networking configuration.
When a tool like ping gets a SERVFAIL from the internal resolver, it tries the other nameservers. But, nslookup does not, and with this change it does not need to.
The internal resolver learns external server addresses from the container's HNSEndpoint configuration, so it will use the same DNS servers as processes in the container.
The internal resolver for Windows containers listens on the network's gateway address, and each container may have a different set of external DNS servers. So, the resolver uses the source address of the DNS request to select external resolvers.
On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used to prevent the internal resolver from forwarding requests (restoring the old behaviour).
The `normalizeWorkdir` function has two branches, one that returns a result of `filepath.Join` which always returns a cleaned path, and another one where the input string is returned unmodified.
To make these two outputs consistent, also clean the path in the second branch.
This also makes the cleaning of the container workdir explicit in the `normalizeWorkdir` function instead of relying on the `SetupWorkingDirectory` to mutate it.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This error is returned when attempting to walk a descriptor that *should* be an index or a manifest. Without this the error is not very helpful sicne there's no way to tell what triggered it.
full diff: https://github.com/golang/mod/compare/v0.13.0...v0.17.0
- modfile: do not collapse if there are unattached comments within blocks - modfile: fix crash on AddGoStmt in empty File - modfile: improve directory path detection and error text consistency - modfile: use new go version string format in WorkFile.add error - sumdb: replace globsMatchPath with module.MatchPrefixPatterns - sumdb/tlog: make NewTiles only generate strictly necessary tiles
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Running SetKey to store the OCI Sandbox key after task creation, rather than from the OCI prestart hook, meant it happened after sysctl settings were applied by the runtime - which was the intention, we wanted to complete Sandbox configuration after IPv6 had been disabled by a sysctl if that was going to happen.
But, it meant '--sysctl' options for a specfic network interface caused container task creation to fail, because the interface is only moved into the network namespace during SetKey.
This change restores the SetKey prestart hook, and regenerates config files that depend on the container's support for IPv6 after the task has been created. It also adds a regression test that makes sure it's possible to set an interface-specfic sysctl.
Running SetKey to store the OCI Sandbox key after task creation, rather than from the OCI prestart hook, meant it happened after sysctl settings were applied by the runtime - which was the intention, we wanted to complete Sandbox configuration after IPv6 had been disabled by a sysctl if that was going to happen.
But, it meant '--sysctl' options for a specfic network interface caused container task creation to fail, because the interface is only moved into the network namespace during SetKey.
This change restores the SetKey prestart hook, and regenerates config files that depend on the container's support for IPv6 after the task has been created. It also adds a regression test that makes sure it's possible to set an interface-specfic sysctl.
The NetworkMode "default" is now normalized into the value it aliases ("bridge" on Linux and "nat" on Windows) by the ContainerCreate endpoint, the legacy image builder, Swarm's cluster executor and by the container restore codepath.
builder-next is left untouched as it already uses the normalized value (ie. bridge).
Going forward, this will make maintenance easier as there's one less NetworkMode to care about.
This was brought up by bmitch that its not expected to have a platform object in the config descriptor. Also checked with tianon who agreed, its not _wrong_ but is unexpected and doesn't neccessarily make sense to have it there.
Also, while technically incorrect, ECR is throwing an error when it sees this.
- https://github.com/golang/net/compare/v0.18.0...v0.22.0 - websocket: add support for dialing with context - http2: remove suspicious uint32->v conversion in frame code - http2: send an error of FLOW_CONTROL_ERROR when exceed the maximum octets - https://github.com/golang/crypto/compare/v0.17.0...v0.21.0 - internal/poly1305: drop Go 1.12 compatibility - internal/poly1305: improve sum_ppc64le.s - ocsp: don't use iota for externally defined constants
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: https://github.com/golang/net/compare/v0.22.0...v0.23.0
Includes a fix for CVE-2023-45288, which is also addressed in go1.22.2 and go1.21.9;
> http2: close connections when receiving too many headers > > Maintaining HPACK state requires that we parse and process > all HEADERS and CONTINUATION frames on a connection. > When a request's headers exceed MaxHeaderBytes, we don't > allocate memory to store the excess headers but we do > parse them. This permits an attacker to cause an HTTP/2 > endpoint to read arbitrary amounts of data, all associated > with a request which is going to be rejected. > > Set a limit on the amount of excess header frames we > will process before closing a connection. > > Thanks to Bartek Nowotarski for reporting this issue.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
go1.21.9 (released 2024-04-03) includes a security fix to the net/http package, as well as bug fixes to the linker, and the go/types and net/http packages. See the [Go 1.21.9 milestone](https://github.com/golang/go/issues?q=milestone%3AGo1.21.9+label%3ACherryPickApproved) for more details.
These minor releases include 1 security fixes following the security policy:
- http2: close connections when receiving too many headers
Maintaining HPACK state requires that we parse and process all HEADERS and CONTINUATION frames on a connection. When a request's headers exceed MaxHeaderBytes, we don't allocate memory to store the excess headers but we do parse them. This permits an attacker to cause an HTTP/2 endpoint to read arbitrary amounts of header data, all associated with a request which is going to be rejected. These headers can include Huffman-encoded data which is significantly more expensive for the receiver to decode than for an attacker to send.
Set a limit on the amount of excess header frames we will process before closing a connection.
Thanks to Bartek Nowotarski (https://nowotarski.info/) for reporting this issue.
This is CVE-2023-45288 and Go issue https://go.dev/issue/65051.
View the release notes for more information: https://go.dev/doc/devel/release#go1.22.2
- https://github.com/golang/go/issues?q=milestone%3AGo1.21.9+label%3ACherryPickApproved - full diff: https://github.com/golang/go/compare/go1.21.8...go1.21.9
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
The internal DNS resolver should only forward requests to external resolvers if the libnetwork.Sandbox served by the resolver has external network access (so, no forwarding for '--internal' networks).
The test for external network access was whether the Sandbox had an Endpoint with a gateway configured.
However, an ipvlan-l3 networks with external network access does not have a gateway, it has a default route bound to an interface.
Also, we document that an ipvlan network with no parent interface is equivalent to a '--internal' network. But, in this case, an ipvlan-l2 network was configured with a gateway. So, DNS proxying would be enabled in the internal resolver (and, if the host's resolver was on a localhost address, requests to external resolvers from the host's network namespace would succeed).
So, this change adjusts the test for enabling DNS proxying to include a check for '--internal' (as a shortcut) and, for non-internal networks, checks for a default route as well as a gateway. It also disables configuration of a gateway or a default route for an ipvlan Endpoint if no parent interface is specified.
(Note if a parent interface with no external network is supplied as '-o parent=<dummy>', the gateway/default route will still be set up and external DNS proxying will be enabled. The network must be configured as '--internal' to prevent that from happening.)
We document that an macvlan network with no parent interface is equivalent to a '--internal' network. But, in this case, an macvlan network was still configured with a gateway. So, DNS proxying would be enabled in the internal resolver (and, if the host's resolver was on a localhost address, requests to external resolvers from the host's network namespace would succeed).
This change disables configuration of a gateway for a macvlan Endpoint if no parent interface is specified.
(Note if a parent interface with no external network is supplied as '-o parent=<dummy>', the gateway will still be set up. Documentation will need to be updated to note that '--internal' should be used to prevent DNS request forwarding in this case.)
This reverts commit a77e147d322c153ae1c2ae0ee45c1835c109e231.
The ipvlan integration tests have been skipped in CI because of a check intended to ensure the kernel has ipvlan support - which failed, but seems to be unnecessary (probably because kernels have moved on).
Disable IPv6 for endpoints in '--ipv6=false' networks.
No IPAM IPv6 address is given to an interface in a network with '--ipv6=false', but the kernel would assign a link-local address and, in a macvlan/ipvlan network, the interface may get a SLAAC-assigned address.
daemon: add nolint-comments for deprecated kernel-memory options, hooks
This adds some nolint-comments for the deprecated kernel-memory options; we deprecated these, but they could technically still be accepted by alternative runtimes.
daemon/daemon_unix.go:108:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck) memory.Kernel = &config.KernelMemory ^ daemon/update_linux.go:63:3: SA1019: memory.Kernel is deprecated: kernel-memory limits are not supported in cgroups v2, and were obsoleted in [kernel v5.4]. This field should no longer be used, as it may be ignored by runtimes. (staticcheck) memory.Kernel = &resources.KernelMemory ^
Prestart hooks are deprecated, and more granular hooks should be used instead. CreateRuntime are the closest equivalent, and executed in the same locations as Prestart-hooks, but depending on what these hooks do, possibly one of the other hooks could be used instead (such as CreateContainer or StartContainer). As these hooks are still supported, this patch adds nolint comments, but adds some TODOs to consider migrating to something else;
daemon/nvidia_linux.go:86:2: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck) s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ ^
daemon/oci_linux.go:76:5: SA1019: s.Hooks.Prestart is deprecated: use [Hooks.CreateRuntime], [Hooks.CreateContainer], and [Hooks.StartContainer] instead, which allow more granular hook control during the create and start phase. (staticcheck) s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ ^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- config: add idmap and ridmap mount options - config.md: allow empty mappings for [r]idmap - features-linux: Expose idmap information - mount: Allow relative mount destinations on Linux - features: add potentiallyUnsafeConfigAnnotations - config: add support for org.opencontainers.image annotations
Minor fixes:
- config: improve bind mount and propagation doc
full diff: https://github.com/opencontainers/runtime-spec/compare/v1.1.0...v1.2.0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the internal DNS resolver for Windows containers forward requests to upsteam DNS servers when it cannot respond itself, rather than returning SERVFAIL.
Windows containers are normally configured with the internal resolver first for service discovery (container name lookup), then external resolvers from '--dns' or the host's networking configuration.
When a tool like ping gets a SERVFAIL from the internal resolver, it tries the other nameservers. But, nslookup does not, and with this change it does not need to.
The internal resolver learns external server addresses from the container's HNSEndpoint configuration, so it will use the same DNS servers as processes in the container.
The internal resolver for Windows containers listens on the network's gateway address, and each container may have a different set of external DNS servers. So, the resolver uses the source address of the DNS request to select external resolvers.
On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used to prevent the internal resolver from forwarding requests (restoring the old behaviour).
The `normalizeWorkdir` function has two branches, one that returns a result of `filepath.Join` which always returns a cleaned path, and another one where the input string is returned unmodified.
To make these two outputs consistent, also clean the path in the second branch.
This also makes the cleaning of the container workdir explicit in the `normalizeWorkdir` function instead of relying on the `SetupWorkingDirectory` to mutate it.
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
This error is returned when attempting to walk a descriptor that *should* be an index or a manifest. Without this the error is not very helpful sicne there's no way to tell what triggered it.
full diff: https://github.com/golang/mod/compare/v0.13.0...v0.17.0
- modfile: do not collapse if there are unattached comments within blocks - modfile: fix crash on AddGoStmt in empty File - modfile: improve directory path detection and error text consistency - modfile: use new go version string format in WorkFile.add error - sumdb: replace globsMatchPath with module.MatchPrefixPatterns - sumdb/tlog: make NewTiles only generate strictly necessary tiles
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>