feat(commands): wrapper UX hardening — persistent flags, -n drop, crashdump, kubeconfig hint, dmesg cushion, TUI refusal#197
Conversation
…shdump, kubeconfig hint, dmesg cushion, TUI refusal Six open-issue fixes in one batch, all touching the talosctl wrapper layer at pkg/commands/talosctl_wrapper.go and its sibling per-command handlers. # Phase 1 — persistent flag propagation (closes #194) wrapTalosCommand's flag-copy loop visited cmd.Flags() only, returning the command's LOCAL flag set. Persistent flags from an upstream parent (e.g. imageCmd.PersistentFlags().StringVar(..., "namespace", ...), metaCmd.PersistentFlags().BoolVarP(..., "insecure", "i", ...)) live in a separate FlagSet and were silently lost by the wrapped tree — ~25 flags affected across image, image talos-bundle, image kubernetes-bundle, image cache create/serve/certs gen, image integration, meta. propagatePersistentFlags copies upstream's PersistentFlags onto wrappedCmd.PersistentFlags so cobra's mergePersistentFlags walks them to wrapped children at parse time. Names already registered on talm's own rootCmd are skipped via rootShadowedPersistentFlags — without that, upstream's addCommand-installed --nodes / --talosconfig / --context / --cluster / --endpoints / --skip-verify / --siderov1-keys-dir bindings would bind to taloscommands.GlobalArgs.<X> and the wrapper PreRunE's struct sync (taloscommands.GlobalArgs = commands.GlobalArgs) would wipe the parsed value. The shadow map is table-tested; the -f -> -F shorthand rename is pinned as well. # Drop -n shorthand from root --nodes (closes #186) talm's root --nodes claimed shorthand -n. Operators typing `talm get hostnames -n network --nodes $NODE --endpoints $NODE` saw `network` parsed silently as a second node entry, then failed inside the gRPC name resolver with "produced zero addresses". Drop the shorthand so cobra cleanly errors with "flag -n not defined" — loud refusal instead of silent misinterpretation. Long form --nodes and modeline auto-population unchanged. Upstream talosctl does not register -n for --namespace on any wrapped subcommand (image and get both use shorthand-free --namespace), so the drop closes a shadow trap without introducing an inherited-alias gap. registerRootFlags is extracted from Execute so the NodesHasNoShorthand test can exercise the registration without running cobra's executor. # Crashdump per-class node prepopulation (closes #180) Upstream crashdump uses per-class node flags (--init-node / --control-plane-nodes / --worker-nodes) rather than the global --nodes. The upstream WithClient guard at cmd/talosctl/pkg/talos/global/client.go pre-validates GlobalArgs.Nodes regardless, so operators following the documented per-class shape hit "nodes are not set for the command" before crashdump's own deprecation message (it redirects to `talosctl support`) can surface. wrapCrashdumpCommand chains the original PreRunE AFTER populating GlobalArgs.Nodes from the union of the per-class flags. The before-the-chain placement is forced by the upstream-sync ordering: wrapTalosCommand's PreRunE copies commands.GlobalArgs into taloscommands.GlobalArgs near its end, so any population that runs after the chain only updates talm's side. Resulting precedence: - Explicit --nodes on CLI wins (lands in GlobalArgs.Nodes during cobra's flag parse, before any PreRunE) - Per-class flags win over modeline (forced by the chain order required by the upstream guard) - Modeline wins only when nothing else has # Rewrite kubeconfig positional-path rejection (closes #193) The previous error claimed `use --login flag to pass arguments`, conflating two distinct things: --login switches the kubeconfig target between local and system, it does not pass a positional path. A reader could reasonably believe they should re-run with `--login /some/path` — that neither does what they want nor matches what --login is for. Replace with a structured errors.WithHint chain whose body names the actual default (writes to project root, positional path not consumed) and whose hint lists the real alternatives: --login for system kubeconfig, shell redirection or --root for a custom local path. # Cushion dmesg --tail=N ParseBool error (closes #195) Upstream talosctl registers --tail as a BoolVarP toggling tail-mode for --follow (Dmesg(ctx, follow, tail bool) on the wire). Operators' first instinct is tail(1)-style line count; `talm dmesg --tail=3` then surfaces `strconv.ParseBool: parsing "3": invalid syntax` with no guidance. wrapDmesgCommand installs a FlagErrorFunc that detects the ParseBool-on-tail pattern and wraps (not replaces) the original error with operator-facing context. The body names the actual contract; the hint suggests pipe-to-tail(1) for line count or --follow --tail for stream-mode. errors.Wrap keeps the original error reachable via errors.Unwrap so verbose debugging stays possible. # Refuse non-tty stdin on dashboard/edit (closes #183) talosctl's dashboard and edit subcommands rely on gdamore/tcell. When stdin is not attached to a terminal, tcell fails to allocate and the program panics inside tScreen.finish with `close of nil channel` during teardown. Operators running under CI / piped stdin see a Go stack trace with no signal. wrapTUICommand chains the original PreRunE, then consults the stdinIsTTY predicate (function-typed for test injection). On non-tty stdin, returns a structured errors.WithHint: the body names the command and the missing tty; the hint lists what to do instead (real terminal for the TUI, or `talm get` for non-interactive inspection). stdinIsTTY uses golang.org/x/term IsTerminal against os.Stdin.Fd() — already in the indirect dep graph. # Tests 12 new tests in pkg/commands/talosctl_wrapper_test.go pinning every contract layer: synthetic + real-upstream propagation for #194, shadow-map and -f-rename for the propagation pass, crashdump no-shadow + crashdump per-class populate for #180, kubeconfig contract-msg for #193, dmesg ParseBool rewrite + chain preservation for #195, TUI refuse + tty pass for #183. Plus one new test in main_test.go for the -n drop (#186). All red-first; all green after the implementation commits. Lint clean on host and GOOS=windows, go vet clean, race detector clean. # Real-cluster validation Smoked against a 3-node OCI Talos v1.12.6 stand: - talm image list --namespace system --nodes <ip> --endpoints <ip> -> system-namespace images returned (#194 namespace) - talm meta -i write 0x0a "x" --nodes <ip> --endpoints <ip> -> -i shorthand accepted (#194 persistent-shorthand path) - talm get hostnames --namespace network --nodes <ip> --endpoints <ip> -> server-side resource-not-registered (cleanly past flag parsing, no -n shadow) (#186) - talm crashdump --control-plane-nodes <ip> --endpoints <ip> -> upstream deprecation message reaches operator (#180) - talm kubeconfig /tmp/x -> new project-root / --login hint (#193) - talm dmesg --tail=3 --nodes <ip> --endpoints <ip> -> boolean-toggle hint with pipe-to-tail(1) suggestion (#195) - timeout 3 talm dashboard --nodes <ip> --endpoints <ip> < /dev/null -> clean refusal, no tcell panic (#183) Closes #180 Closes #183 Closes #186 Closes #193 Closes #194 Closes #195 Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR refactors CLI flag contracts and wrapper infrastructure by removing the ChangesCLI Flag Contracts and Wrapper Infrastructure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors flag registration and command wrapping to improve the operator experience and prevent flag shadowing. Key updates include removing the -n shorthand from the global --nodes flag, implementing node pre-population for crashdump and rotate-ca, and adding TTY checks for interactive commands. The PR also improves error messaging for dmesg and kubeconfig. Review feedback points out a potential bug where flags might be registered twice in the wrapper logic and suggests a more complete flag cloning implementation to preserve metadata like ShorthandDeprecated and synchronization of the Changed status.
| cmd.PersistentFlags().VisitAll(func(flag *pflag.Flag) { | ||
| if _, shadowed := rootShadowedPersistentFlags[flag.Name]; shadowed { | ||
| return | ||
| } | ||
|
|
||
| if flag.Shorthand == "f" { | ||
| renamed := &pflag.Flag{ | ||
| Name: flag.Name, | ||
| Usage: flag.Usage, | ||
| Value: flag.Value, | ||
| DefValue: flag.DefValue, | ||
| Changed: flag.Changed, | ||
| NoOptDefVal: flag.NoOptDefVal, | ||
| Deprecated: flag.Deprecated, | ||
| Hidden: flag.Hidden, | ||
| Shorthand: "F", | ||
| Annotations: flag.Annotations, | ||
| } | ||
| wrappedCmd.PersistentFlags().AddFlag(renamed) | ||
|
|
||
| return | ||
| } | ||
|
|
||
| wrappedCmd.PersistentFlags().AddFlag(flag) | ||
| }) |
There was a problem hiding this comment.
This function introduces duplicate flag registration. The existing loop in wrapTalosCommand (lines 154-179) already visits all flags of the upstream command via cmd.Flags().VisitAll and adds them to wrappedCmd.Flags(). Since cmd.Flags() includes persistent flags, calling propagatePersistentFlags afterwards results in the same flags being registered twice: once in the local flag set and once in the persistent flag set of wrappedCmd. This will cause issues (and potentially panics) during flag parsing when Cobra attempts to merge these sets.
Additionally, cmd.Flags().VisitAll in the first loop includes flags inherited from talosctl's root command, which should not be copied as local flags to every subcommand. The flag copying logic should be unified to use cmd.LocalFlags() (or cmd.NonInheritedFlags()) and correctly distribute flags between the local and persistent sets in a single pass.
There was a problem hiding this comment.
Verified the duplicate-registration claim against the actual code and the cobra runtime: it doesn't hold.
cmd.Flags() returns the command's LOCAL flag set only. Persistent flags live in a separate pflag.FlagSet accessed via cmd.PersistentFlags() and are merged into the effective set at parse time by cobra.Command.mergePersistentFlags() (cobra@v1.10.2/command.go:1896-1922). Before that merge runs — and wrapTalosCommand is called at wrap time, before any parse — the two sets are disjoint. So cmd.Flags().VisitAll enumerates exactly the local flags, and cmd.PersistentFlags().VisitAll in propagatePersistentFlags enumerates exactly the persistent flags. No flag is visited twice.
The same reasoning applies to the "inherited from talosctl's root" concern: taloscommands.Commands is a slice of standalone top-level commands, not children of an upstream root. Calling cmd.Flags() on them doesn't reach a root parent because they don't have one in talm's wrapper context. The persistent-flag pass walks cmd.PersistentFlags() and the talm-root-shadowed names (--nodes, --endpoints, etc. that upstream's addCommand registers on every top-level command) are filtered via rootShadowedPersistentFlags to prevent the sync-direction collision.
Three regression tests cover this: TestWrapTalosCommand_InheritsParentPersistentFlags (synthetic), TestWrapTalosCommand_RealImageListPropagatesNamespace (real upstream image tree — imageCmd.PersistentFlags() reaches image list), TestWrapTalosCommand_RealMetaWritePropagatesInsecure (real upstream meta with -i / --insecure persistent shorthand). All green; cobra parses the wrapped child without "flag redefined" panics.
| renamed := &pflag.Flag{ | ||
| Name: flag.Name, | ||
| Usage: flag.Usage, | ||
| Value: flag.Value, | ||
| DefValue: flag.DefValue, | ||
| Changed: flag.Changed, | ||
| NoOptDefVal: flag.NoOptDefVal, | ||
| Deprecated: flag.Deprecated, | ||
| Hidden: flag.Hidden, | ||
| Shorthand: "F", | ||
| Annotations: flag.Annotations, | ||
| } |
There was a problem hiding this comment.
When cloning a flag to rename its shorthand, several fields are missing from the manual copy, such as ShorthandDeprecated. More importantly, the Changed status of the flag is not synchronized. If the upstream command's logic relies on cmd.Flags().Changed("file"), it will incorrectly see it as false even if the user provided the renamed -F shorthand, because only the newFlag object's Changed status is updated by the parser.
Consider using a helper function for cloning and implementing a mechanism in PreRunE to sync the Changed status from the wrapped command's flags back to the original command's flags.
| renamed := &pflag.Flag{ | |
| Name: flag.Name, | |
| Usage: flag.Usage, | |
| Value: flag.Value, | |
| DefValue: flag.DefValue, | |
| Changed: flag.Changed, | |
| NoOptDefVal: flag.NoOptDefVal, | |
| Deprecated: flag.Deprecated, | |
| Hidden: flag.Hidden, | |
| Shorthand: "F", | |
| Annotations: flag.Annotations, | |
| } | |
| renamed := &pflag.Flag{ | |
| Name: flag.Name, | |
| Usage: flag.Usage, | |
| Value: flag.Value, | |
| DefValue: flag.DefValue, | |
| Changed: flag.Changed, | |
| NoOptDefVal: flag.NoOptDefVal, | |
| Deprecated: flag.Deprecated, | |
| Hidden: flag.Hidden, | |
| Shorthand: "F", | |
| ShorthandDeprecated: flag.ShorthandDeprecated, | |
| Annotations: flag.Annotations, | |
| } |
There was a problem hiding this comment.
Addressed in 4ecd2df — added ShorthandDeprecated to the flag-clone path (and the persistent-flag clone in propagatePersistentFlags). Extracted the clone into a renameFlagShorthand helper so both call sites stay in sync as pflag grows new metadata fields.
On the second part (sync Changed from the wrapped flag back to the original via PreRunE): not applied. The wrapped command's RunE is assigned from upstream's cmd.RunE, and cobra passes the WRAPPED command instance to RunE(cmd, args) at runtime. Upstream's RunE consults cmd.Flags().Changed("file") against that same wrapped instance, where the renamed -F flag's Changed status IS correctly set by the parser. The original *pflag.Flag pointer is never queried after wrap because the original *cobra.Command is never executed — only the wrapped one is. Writing back to the original would be dead work.
… via helper Address review feedback from gemini-code-assist on pkg/commands/talosctl_wrapper.go:121: the manual pflag.Flag clone used for the -f -> -F shorthand rename was missing the ShorthandDeprecated field. Without it, an upstream flag that marks its short form as deprecated would have that deprecation lost on the wrapped clone, and pflag's MarkShorthandDeprecated check (flag.go:719) would re-emit the shorthand in help output. Extract a renameFlagShorthand helper that mirrors every pflag.Flag metadata field, and use it from both the local-flag and persistent-flag copy loops. Value is intentionally shared (not deep-copied) — the parser must write to upstream's variable so the wrapped RunE (assigned from upstream) can still read its own state via cmd.Flags().Changed/.Value lookups against the wrapped command instance cobra passes at runtime. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Summary
Baseline repository scaffolding — a single PR covering the GitHub artefacts, contributor docs, lint configs, CI smoke tests, and Renovate that the repo was missing. After merge the new label scheme will be applied to the existing PR and issue.
What's added
.github/artefactslabels.yml— 38 labels in K8s-style namespaces (kind/,area/,priority/,triage/,lifecycle/,do-not-merge/,size/). Aliases retire the four GitHub defaults currently in use (bug→kind/bug,enhancement→kind/feature,documentation→kind/documentation,question→triage/needs-information) without losing references.workflows/labels.yaml— validate + sync viaEndBug/label-sync@v2, weekly cron.workflows/ci.yml— helm template smoke tests, yamllint, markdownlint.pull_request_template.md— Helm/Make-specific (no Go-only items).ISSUE_TEMPLATE/{bug_report,feature_request,config}— pre-labelled, blank issues disabled.CODEOWNERS— flat global ownership.renovate.json5— image:tag@digest in first-party values.yaml plus GH Actions, weekly.Top-level
CONTRIBUTING.md— branch flow, Conventional Commits, signoff, GPG, vendored-chart rule.SECURITY.md— private vulnerability reporting via GitHub Advisories..gitattributes— vendored chart trees markedlinguist-vendored,_out/linguist-generated..editorconfig— 2-space, LF, UTF-8; tabs for Makefile/.mk; preserve trailing in .md..yamllint.yaml,.markdownlint-cli2.jsonc— lint configs matching the CI workflow.Notes
packages/core/platformchart is intentionally excluded from the helm-template CI step. Itsrepository.yamluseslookup+failand cannot render offline; re-enable once that template is made dry-run-safe.helm lintis not run in CI.packages/system/billingandpackages/system/tenant-locking-controllerdo not declare their subchart dependencies inChart.yaml, whichhelm lintflags as an error. Out of scope here — fix in a follow-up.Verification
Local checks (all green):
.github/andpackages/.*.md.renovate-config-validator .github/renovate.json5valid.After merge
gh workflow run labels.yamlto apply the new label set (also runs automatically on the merge push)./categorizeon PR Nodeworkertemplate hasfloatingIPvipsection #7 (the open stability PR) and issue talm.discovered.default_link_selector_by_gateway might generate duplicated keys for selector #6 (system bundle placeholder) to exercise the new scheme.Summary by CodeRabbit
Release Notes
Bug Fixes
dmesg --tailflag to clarify boolean behavior.kubeconfigcommand error messages with guidance on flag usage.Refactor
-nshorthand from--nodesflag; use--nodes(long form) instead.dashboard,edit) now require terminal input attachment.rotate-cadocumentation to reflect flag changes.