Skip to content

fix(charts): rewrite v1.12 multi-doc network renderer for full link coverage#147

Merged
Aleksei Sviridkin (lexfrei) merged 2 commits into
mainfrom
fix/multi-doc-renderer-coverage
May 7, 2026
Merged

fix(charts): rewrite v1.12 multi-doc network renderer for full link coverage#147
Aleksei Sviridkin (lexfrei) merged 2 commits into
mainfrom
fix/multi-doc-renderer-coverage

Conversation

@lexfrei

Copy link
Copy Markdown
Contributor

What changed

The v1.12 multi-doc network renderer in cozystack and generic charts now emits one document per configurable link on the node instead of stopping at the gateway-bearing one. Multi-NIC nodes therefore produce multiple LinkConfig / BondConfig / VLANConfig documents — a node with a routed NIC plus a storage NIC ends up with both NICs configured in the rendered config.

The IPv4 default-route gateway is placed only on the link that carries it; every other link is emitted gateway-less. MTU is surfaced on LinkConfig, BondConfig, and VLANConfig when discovery reports a value, so non-default-MTU links (jumbo frames, GRE) survive a re-render. Both IPv4 and IPv6 global-scope addresses on a link are surfaced.

Guardrails:

  • Bond slaves are filtered out of configurable_link_names so they do not collide with the master's BondConfig.
  • Bridges are deliberately not auto-emitted as BridgeConfig yet — a non-gateway bridge is skipped (declare it via per-node body); a bridge that carries the default route fails the render with a clear migration hint.
  • VLAN with missing parent or vlanID fails the render rather than emitting a document Talos rejects on apply.
  • Bond with partial bondMaster gracefully degrades — unset fields are omitted instead of producing bondMode: <nil>.
  • floatingIP is stripped from per-link addresses so the VIP currently held by a leader does not leak into the static LinkConfig.
  • Legacy machine.network.interfaces[] in the running MachineConfig triggers a hard fail with a migration message, preventing silent loss of user network declarations during upgrade from a chart that emitted the legacy schema.

Tests

12 new MultiDoc subtests pin every guardrail and every emit path: multi-link iteration, MTU on all three doc kinds, VLAN emission, bond slaves filtered, bridge skip + bridge-as-gateway fail, VLAN-without-parent fail, VLAN-without-vlanID fail, partial-bondMaster gracefully degraded, floatingIP stripped from addresses, dual-stack IPv6 addresses surfaced, Layer2VIPConfig.link on bond and VLAN gateway, legacy-interfaces guardrail. Six new fixtures mirror real Talos COSI shape (slaveKind, masterIndex, linkIndex). README updated to describe per-link emission, the bond-slave filter, the bridge contract, the dual-stack contract, the VIP-strip, and the upgrade-from-legacy guardrail.

Closes #140.
Closes #141.
Closes #142.
Closes #143.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Talos v1.12+ multi-document network configuration rendering logic in the cozystack and generic charts. The changes ensure that all configurable links—including physical NICs, bonds, and VLANs—are correctly processed with MTU support, while bond slaves are filtered out to prevent configuration conflicts. It also introduces guardrails to fail fast when legacy network schemas are detected or when unsupported bridge configurations carry the default route. Feedback was provided regarding the README.md, where the collapsing of paragraphs into single long lines was identified as a readability and maintainability issue for the source file.

Comment thread README.md
> **Note:** The output format depends on the Talos version configured in `Chart.yaml` (`templateOptions.talosVersion`) or via the `--talos-version` CLI flag.
> For Talos < v1.12, the output is a single YAML document with `machine.network` and `machine.registries` sections (as shown above).
> For Talos >= v1.12, the output uses the multi-document format with separate typed documents instead of the deprecated monolithic fields. `HostnameConfig`, `ResolverConfig` and a network interface document (`LinkConfig`, `BondConfig`, or `VLANConfig` — depending on topology) are always emitted; `Layer2VIPConfig` appears on controlplane nodes when `floatingIP` is set; `RegistryMirrorConfig` is emitted only by the cozystack chart.
> **Note:** The output format depends on the Talos version configured in `Chart.yaml` (`templateOptions.talosVersion`) or via the `--talos-version` CLI flag. For Talos < v1.12, the output is a single YAML document with `machine.network` and `machine.registries` sections (as shown above). For Talos >= v1.12, the output uses the multi-document format with separate typed documents instead of the deprecated monolithic fields. `HostnameConfig` and `ResolverConfig` are always emitted; one network interface document is emitted per configurable link on the node (`LinkConfig` for physical NICs, `BondConfig` for bond masters, `VLANConfig` for VLAN sub-interfaces) — multi-NIC nodes therefore produce one document per NIC, not one document total. The link carrying the IPv4 default route gets the gateway entry on its document; every other link is emitted gateway-less. Both IPv4 and IPv6 global-scope addresses on a link are surfaced in its document. Bond slaves are filtered out so they do not collide with the master's `BondConfig`. Bridges are deliberately not auto-emitted as `BridgeConfig` yet — a non-gateway bridge is skipped (declare it via a per-node body overlay if needed); a bridge that carries the default route fails the render with a clear migration hint. The operator-declared `floatingIP` is stripped from per-link addresses so the VIP currently held by a leader does not leak into the static `LinkConfig`. `Layer2VIPConfig` appears on controlplane nodes when `floatingIP` is set; `RegistryMirrorConfig` is emitted only by the cozystack chart.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line and several others in this file have been collapsed into very long single lines of text. While this doesn't affect the rendered output, it harms the readability and maintainability of the source Markdown file. Please consider re-wrapping these long paragraphs to a conventional line length (e.g., 80-120 characters) to improve readability for future contributors.

@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Rate limit exceeded

@lexfrei has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 50 minutes and 14 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d7b5d1a9-4245-482b-9dc5-e43689320119

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6d47f and e64dfd2.

📒 Files selected for processing (5)
  • README.md
  • charts/cozystack/templates/_helpers.tpl
  • charts/generic/templates/_helpers.tpl
  • charts/talm/templates/_helpers.tpl
  • pkg/engine/render_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/multi-doc-renderer-coverage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lexfrei Aleksei Sviridkin (lexfrei) marked this pull request as ready for review May 7, 2026 21:11
…U, bond, bridge, and legacy guardrail

Add MultiDoc subtests covering the headline coverage gaps in the v1.12
multi-doc renderer:

- TestMultiDocRendersAllConfigurableLinks: fixture with eth0 (routed)
  + eth1 (storage). Renderer must emit a LinkConfig per configurable
  link instead of stopping at the gateway-bearing one.

- TestMultiDocLinkConfigCarriesMTU: same fixture with non-default MTU
  on eth1 (jumbo). LinkConfig must surface mtu so a re-render does
  not silently drop the value.

- TestMultiDocEmitsVLANConfigForDiscoveredVLAN: fixture with
  eth0.4000 carrying the default route via a VLAN over eth0. The
  renderer must emit LinkConfig (parent eth0) + VLANConfig (eth0.4000
  with parent + vlanID) and place the gateway on the VLAN.

- TestMultiDocBondSlavesNotEmittedAsLinkConfig: fixture with two
  physical NICs enrolled into a bond master. Slaves must not appear
  as standalone LinkConfig documents alongside the master's
  BondConfig — Talos rejects the conflicting declarations.

- TestMultiDocBridgeSkipsLinkConfigBranch: fixture with a discovered
  bridge link. Bridges must not fall through to LinkConfig (wrong
  document kind) — they are skipped until BridgeConfig support lands.

- TestMultiDocBondAndVLANCarryMTU: pin that MTU surfaces on
  BondConfig and VLANConfig, not just LinkConfig.

- TestMultiDocLayer2VIPLinkOnVLANDefaultGateway: pin that the
  discovery-derived Layer2VIPConfig.link points at the VLAN
  sub-interface when a VLAN carries the default route.

- TestMultiDocBondConfigOmitsMissingBondMasterFields: pin
  graceful degradation when discovery returns a bond with partial
  bondMaster — the renderer must omit unset fields rather than emit
  bondMode: <nil> and break YAML.

- TestMultiDocFailsOnLegacyInterfacesInRunningConfig: fixture with
  non-empty machine.network.interfaces[] in the running MachineConfig.
  The renderer must fail with a clear migration message rather than
  silently dropping the legacy block on first re-render after upgrade
  from a chart that emitted the legacy schema.

Also update TestCozystackChartRendersIPv4GatewayOnDualStack's two-NIC
subtest: with the multi-link renderer landing in the same change, eth1
must now appear as its own LinkConfig (gateway-less, IPv6-only
addresses), not be silently dropped. The single-NIC subtest contract
is unchanged.

Six new fixtures support the new tests: multiNicLookup,
multiNicWithVLANLookup, bondWithSlavesLookup,
bondWithoutBondMasterLookup, bridgeLookup,
legacyInterfacesInRunningConfigLookup. All fixtures return the
resource map for per-id lookups (mirroring real discovery shape) so
the slave-filter and bridge-skip branches hit production-like inputs.

All new tests fail on the previous chart; the renderer rewrite in the
follow-up commit makes them green.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…overage

The previous multi-doc renderer resolved a single $defaultLinkName via
talm.discovered.default_link_name_by_gateway and emitted exactly one
network document for it: BondConfig, VLANConfig, or LinkConfig depending
on the gateway link kind. Every other link on the node — secondary
physical NIC, VLAN sub-interface on a non-gateway parent, additional
bond — was silently dropped from the rendered config.

Replace the single-link block with a range over
talm.discovered.configurable_link_names, emitting the appropriate
document per link kind (bond / vlan / physical). The IPv4 default-route
gateway is placed only on the link that carries it; every other link
gets addresses without a default route. MTU is surfaced on each
document when discovery reports a value, so non-default-MTU links
(jumbo frames, GRE) survive a re-render.

Also harden three coverage edges that the simple range exposes:

- Filter bond slaves out of configurable_link_names. Slaves match the
  physical-NIC regex but are members of a bond master via
  spec.slaveKind; emitting a standalone LinkConfig for a slave
  alongside the master's BondConfig produces conflicting
  declarations Talos rejects on controller convergence.

- Skip kind=bridge in the renderer. The chart does not yet emit
  BridgeConfig, so falling through to LinkConfig would land a
  wrong-kind document. The branch is reserved for a future PR that
  adds bridge support.

- Gate every BondMaster sub-field on its presence. A bond returned
  with a partial or absent bondMaster (real Talos shape on freshly-
  created bonds before the master controller fills the spec) used to
  render bondMode: <nil> and break the parse.

Surface a hard-fail in the multi-doc renderer when the running
MachineConfig still carries non-empty machine.network.interfaces[].
The renderer cannot translate those entries today, and silently
dropping them on the first re-render after a chart upgrade from the
legacy schema would erase user-declared interfaces, addresses, and
VLANs. The fail message points the operator at the concrete migration
path: move the interfaces into per-node body overlays as v1.12 typed
documents, or pin templateOptions.talosVersion to v1.11 until the
translator lands.

README updated to describe the per-link emission, the bond-slave
filter, and the upgrade-from-legacy guardrail with the fail message
text and migration path.

Together this closes the multi-doc renderer coverage gaps:
single-link-only emission, dropped non-gateway VLAN sub-interfaces,
silent loss of legacy machine.network.interfaces during upgrade,
and missing MTU on every multi-doc document kind.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei Aleksei Sviridkin (lexfrei) force-pushed the fix/multi-doc-renderer-coverage branch from 79be083 to e64dfd2 Compare May 7, 2026 21:12

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lexfrei Aleksei Sviridkin (lexfrei) merged commit 009104b into main May 7, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants