controller: support namespace-scoped cache via WATCH_NAMESPACE#766
Open
zjh7890 wants to merge 2 commits intoalibaba:mainfrom
Open
controller: support namespace-scoped cache via WATCH_NAMESPACE#766zjh7890 wants to merge 2 commits intoalibaba:mainfrom
zjh7890 wants to merge 2 commits intoalibaba:mainfrom
Conversation
Both BatchSandboxReconciler and PoolReconciler declare Owns(&corev1.Pod{}).
Because cmd/controller/main.go does not configure Cache.DefaultNamespaces,
controller-runtime builds a shared Pod informer whose list/watch scope is
the full cluster. On clusters with many unrelated workloads, this causes
the manager to hold the entire cluster's Pod list in its cache even when
only one namespace contains OpenSandbox CRs, which can lead to OOMKills
during cold-start list+decode+index.
Add a new --watch-namespaces flag (comma-separated) that defaults to the
WATCH_NAMESPACE environment variable. When set, the value is wired into
cache.Options.DefaultNamespaces so the shared informer cache only serves
the listed namespaces. When unset, behavior is unchanged (cluster-scoped
cache, watches all namespaces) for backward compatibility.
- Pool and BatchSandbox CRDs are already namespaced.
- Reconcile-time list/get calls already restrict to the CR's own namespace.
- BatchSandbox.Spec.PoolRef has no namespace field, so cross-namespace
allocation is not an existing feature and is not regressed.
|
zjh7890 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Address review feedback on the initial commit:
- buildWatchNamespaces now returns (map[string]cache.Config, error).
- Empty / whitespace-only input -> (nil, nil) preserves cluster-wide default.
- Non-empty input that yields no valid tokens returns an error instead of
silently falling back to cluster-wide. This matters because the flag's
purpose is to narrow scope; silent fallback would broaden privileges.
- Empty segments inside a non-empty list (",", "ns-a,,ns-b", "ns-a,")
are now rejected rather than skipped, so operator/templating mistakes
surface instead of being masked.
- Each token is validated with validation.IsDNS1123Label so typos like
"Bad_NS" or overlong names fail fast at startup.
- Duplicates in a valid list continue to be collapsed (set semantics).
- main() exits non-zero with a clear setupLog.Error if parsing fails.
- Startup log sorts the watched namespace list so log output is stable
across restarts (Go map iteration is intentionally unordered).
- Add table-driven test cmd/controller/main_test.go covering empty,
whitespace, single, multiple, duplicate, mid-empty, leading/trailing
empty, DNS-1123 violations (case, underscore), max-length (63) and
overlong (64) cases.
Spground
reviewed
Apr 23, 2026
| flag.BoolVar(&logCompress, "log-compress", true, "Compress determines if the rotated log files should be compressed using gzip") | ||
| flag.Float64Var(&kubeClientQPS, "kube-client-qps", 100, "QPS for Kubernetes client rate limiter.") | ||
| flag.IntVar(&kubeClientBurst, "kube-client-burst", 200, "Burst for Kubernetes client rate limiter.") | ||
| flag.StringVar(&watchNamespaces, "watch-namespaces", os.Getenv("WATCH_NAMESPACE"), |
Collaborator
There was a problem hiding this comment.
The watch-namespace flag name is weird, How about namespace, moreover, In my opinion, there is no need to support value from env.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Both
BatchSandboxReconcilerandPoolReconcilerdeclareOwns(&corev1.Pod{})inSetupWithManager. Incmd/controller/main.go, the manager is created withoutCache.DefaultNamespaces, which means the underlying controller-runtime shared informer for Pods has cluster-wide scope.This is controller-runtime's documented default and isn't wrong on its own. However, on a large shared cluster that hosts many unrelated workloads alongside OpenSandbox:
list+watchof every Pod in the cluster on startup.Pod.ownerRefUIDfield index (internal/utils/fieldindex) is built over that entire set.Observed symptom on a shared multi-tenant cluster where only one namespace contained OpenSandbox CRs:
opensandbox-controller-managercrash-looped withexitCode=137/reason=OOMKilledwithin ~5s of startup. Logs only got as far asregister field index/starting manager/starting server name=health probebefore the kernel killed the process on cgroup memory limit.Why this is safe to narrow
The current reconcile code already treats BatchSandbox/Pool/Pod as a same-namespace graph, so restricting the cache to the namespaces that actually contain OpenSandbox CRs does not remove any existing capability:
BatchSandbox.Spec.PoolRefis just a name, with no namespace field(apis/sandbox/v1alpha1/batchsandbox_types.go L34).
PoolReconcilerlistsBatchSandboxonly withinpool.Namespace(internal/controller/pool_controller.go L146-L148).
PoolReconcilermaps back fromBatchSandboxtoPoolusingbatchSandbox.Namespace(internal/controller/pool_controller.go L295).
BatchSandboxReconcilerfetches owned Pods bybatchSbx.Namespace(internal/controller/batchsandbox_controller.go L342).
So the cluster-wide cache is paying a cost that none of the reconcile paths actually consume.
Change
Add an opt-in flag to narrow the manager cache to a specific set of namespaces.
--watch-namespaces(comma-separated list).WATCH_NAMESPACEenvironment variable (common operator convention).cache.Options.DefaultNamespaces, producing a multi-namespace cache.Parser is fail-closed, not fail-open
The flag's purpose is least-privilege / smaller cache, so malformed input must not silently widen scope back to cluster-wide.
buildWatchNamespacesrejects any malformed input with a clear error and the controller exits non-zero at startup:""(empty)"ns-a"/"ns-a,ns-b"" ns-a , ns-b ""ns-a,ns-a"{ns-a}","," , ","ns-a,",",ns-a","ns-a,,ns-b""Default","bad_ns", 64-char namek8s.io/apimachinery/pkg/util/validation.IsDNS1123Label)The startup log also sorts the namespace list before printing so the line is deterministic across restarts.
Example:
or:
Log line on startup:
vs. unset:
Compatibility & risk
ClusterRoletoRole/RoleBindingin a follow-up; that is not required for the cache change itself to take effect.Owns(&corev1.Pod{})is unchanged. Pool / BatchSandbox reconcile logic is unchanged.cache.Options.DefaultNamespacesis compatible with the existingMetrics/WebhookServermanager options and withfieldindex.RegisterFieldIndexes—multiNamespaceCachesupportsIndexField.DefaultNamespacescreates one cache/informer per namespace, so this flag is intended for "a small number of namespaces replaces one cluster-wide cache", not for arbitrarily long lists.Testing
go build ./cmd/controller/...passes.go vet ./cmd/controller/...passes.New table-driven unit test
cmd/controller/main_test.gocovers 15 cases: empty, whitespace-only, single, multi, trim, duplicates, mid-empty / leading-empty / trailing-empty / single-comma / whitespace-between-commas (all rejected), uppercase (rejected), underscore (rejected), 64-char overlong (rejected), 63-char max-length (accepted). All pass undergo test -count=1 -v ./cmd/controller/....--helpshows the new flag:Follow-ups (not in this PR, happy to do separately if accepted)
WATCH_NAMESPACEenv /--watch-namespacesflag inconfig/manager/manager.yamland the Helm chart so the feature is discoverable.Role/RoleBindinginstead ofClusterRolewhen a scope is requested).Happy to iterate on naming (
--watch-namespacesvs--namespacesvs adopting the prometheus-operator style multi-flag split) or on defaults if maintainers prefer a different convention.