feat(dogstatsd): add configurable pools for origin detection extension fields by thieman · Pull Request #1871 · DataDog/lading

thieman · 2026-05-07T16:35:23Z

Summary

Adds three new optional Config fields to the DogStatsD payload generator, each holding a pool of possible values for one of the origin detection protocol extension fields
Per generated metric, a value is randomly selected from the pool or the field is omitted entirely — consistent with the existing behavior of |c: with its internal random pool
Because millstone's Payload::DogStatsD wraps lading_payload::dogstatsd::Config directly via serde, all three fields are immediately usable in millstone's YAML configuration with no changes to millstone required
No breaking changes: all existing Config constructions use ..Default::default() or Config::default(); the metric structs cannot be externally constructed due to pub(crate) fields; internal constructors are all crate-private
Fixes choose_or_not_ref to short-circuit before touching the RNG when the pool is empty, preserving determinism for seeded output when optional fields are unconfigured

New fields

container_ids — pool for the |c: Local Data field. When empty, falls back to the prior behavior of using an internally generated random value. When non-empty, only values from the pool are used, so callers can supply real container IDs.

external_data — pool for the |e: External Data field. Empty by default; field is never emitted when empty.

cardinality — pool for the |card: cardinality override field. Empty by default; field is never emitted when empty.

Fuzz corpus note

Config derives arbitrary::Arbitrary and is used as a fuzz input in two targets. Adding fields changes what the fuzzer generates, so saved fuzz corpora for those targets should be reset.

Test plan

All existing lading-payload tests pass
New tests verify fields are emitted when pools are non-empty, absent when empty, and that emitted values come only from the configured pool
Probabilistic emission tests run 100 seeds (p(false negative) ≈ 10^-30)

🤖 Generated with Claude Code

…n fields Adds three new optional fields to `Config` that control the DogStatsD origin detection extension fields (`|c:`, `|e:`, `|card:`). Each field takes a pool of possible string values; per generated metric, one value is randomly selected from the pool or the field is omitted entirely (matching the existing behavior of `|c:` with its internal random pool). - `container_ids`: pool for the `|c:` Local Data field. When empty, falls back to the existing behavior of using a randomly generated value. When non-empty, only values from the pool are used, enabling callers to supply real container IDs rather than random strings. - `external_data`: pool for the `|e:` External Data field. Format per value: `pu-<pod_uid>,cn-<container_name>,it-<bool>`. Empty by default (field never emitted). - `cardinality`: pool for the `|card:` cardinality override field. Valid values are `low`, `orchestrator`, `high`, `none`. Empty by default (field never emitted). Because millstone's `Payload::DogStatsD` wraps `lading_payload::dogstatsd::Config` directly via serde, all three fields are immediately available in millstone's YAML configuration with no changes required to millstone itself. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ield docs

…_ids absence

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9047174329

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T18:52:15Z

+        let external_data = choose_or_not_ref(&mut rng, &self.external_data).map(String::as_str);
+        let cardinality = choose_or_not_ref(&mut rng, &self.cardinality).map(String::as_str);


Preserve default fingerprints or update goldens

With the default dogstatsd: {} config used by ci/fingerprints/dogstatsd/lading.yaml, both new pools are empty, but choose_or_not_ref still consumes a random boolean before returning None for an empty slice. That burns two extra RNG draws for every generated metric, changing the default payload stream while ci/fingerprints/dogstatsd/fingerprint.txt was not updated; I checked .github/workflows/ci.yml and the fingerprint job runs ci/fingerprint, so this PR will fail that CI job unless the empty-pool path avoids RNG consumption or the dogstatsd fingerprints are intentionally regenerated.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e4ab32f58

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T19:09:35Z

+    ///
+    /// When non-empty, each generated metric randomly selects from this pool (or omits the field).
+    /// When empty, the field is never emitted.
+    pub cardinality: Vec<String>,


Validate configured cardinality values

When cardinality contains a typo or any value outside the DogStatsD v1.6 set (none, low, orchestrator, high), Config::valid still accepts the config and the metric serializers emit |card:<value> verbatim. In that scenario lading produces invalid cardinality override datagrams instead of failing fast like it does for other bounded config fields; the Datadog format documents the allowed values at https://docs.datadoghq.com/extend/dogstatsd/datagram_shell/#dogstatsd-protocol-v16.

Useful? React with 👍 / 👎.

thieman · 2026-05-07T19:12:24Z

 {
+    if pool.is_empty() {
+        return None;
+    }


Change to prevent advancing the RNG if the pool is empty. This was failing fingerprint tests because our new external_data and cardinality fields default to empty. This change does not affect any of the other three existing usages of choose_or_not_ref because they were never called with empty pools

thieman · 2026-05-07T19:14:26Z

+    // set of differently-seeded runs. Each metric has a 50% chance of including the field, so
+    // the probability of never seeing it across N independent runs is 0.5^N (~10^-30 for N=100).
+    fn assert_field_emitted_at_least_once(config: &Config, field_prefix: &[u8]) {
+        let seen = (0..100).any(|seed| {


This is technically a flakeable test but if anyone ever sees it fail they should go out and buy a lottery ticket. Can remove it if that's still not okay.

I buy the analysis here, no concerns.

…nd 0.9 → 0.10 (#1609) ## Human Summary Bumps our version of `lading` up to bring in the upgrade to `rand` from DataDog/lading#1865 Subsequent PRs will bump `lading` further to bring in some updates I'm working on (see DataDog/lading#1871), just wanted to get this yak out of the way independently ## Summary - Updates `lading-payload` git rev to `01ef8d70`, picking up the rand 0.9 → 0.10 upgrade in lading (#1865) and other upstream improvements - Updates millstone's pinned `rand = "0.9"` to the workspace `rand = "0.10"` to match lading-payload's expectation - Fixes two `corpus.rs` call sites that broke with the new lading API: `DogStatsD::new` now takes `&Config` instead of `Config`, and the internal generator functions are de-genericized from `R: Rng` to concrete `StdRng` The rand 0.9 → 0.10 upgrade in rand_core changes how `Rng` is implemented: `rand_core 0.10` only blanket-implements `Rng` for types satisfying `TryRng<Error=Infallible>` via `DerefMut`, which concrete RNGs satisfy through direct impls in the rand crate. Mixing a rand 0.9 `StdRng` with a rand 0.10 trait boundary produced opaque `DerefMut` errors. The fix is to use the same rand version throughout. ## Test plan - [ ] `cargo check --workspace` passes - [ ] `cargo clippy -p millstone` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com>

…nd 0.9 → 0.10 (#1609) ## Human Summary Bumps our version of `lading` up to bring in the upgrade to `rand` from DataDog/lading#1865 Subsequent PRs will bump `lading` further to bring in some updates I'm working on (see DataDog/lading#1871), just wanted to get this yak out of the way independently ## Summary - Updates `lading-payload` git rev to `01ef8d70`, picking up the rand 0.9 → 0.10 upgrade in lading (#1865) and other upstream improvements - Updates millstone's pinned `rand = "0.9"` to the workspace `rand = "0.10"` to match lading-payload's expectation - Fixes two `corpus.rs` call sites that broke with the new lading API: `DogStatsD::new` now takes `&Config` instead of `Config`, and the internal generator functions are de-genericized from `R: Rng` to concrete `StdRng` The rand 0.9 → 0.10 upgrade in rand_core changes how `Rng` is implemented: `rand_core 0.10` only blanket-implements `Rng` for types satisfying `TryRng<Error=Infallible>` via `DerefMut`, which concrete RNGs satisfy through direct impls in the rand crate. Mixing a rand 0.9 `StdRng` with a rand 0.10 trait boundary produced opaque `DerefMut` errors. The fix is to use the same rand version throughout. ## Test plan - [ ] `cargo check --workspace` passes - [ ] `cargo clippy -p millstone` passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> f91e8aa

thieman and others added 8 commits May 7, 2026 12:35

docs: remove unsourced format and valid-value claims from extension f…

909a610

…ield docs

test(dogstatsd): add tests for extension field pool behavior

569874c

chore: apply rustfmt

5d59cb8

test(dogstatsd): increase probabilistic test runs to 100 to avoid flakes

c39cd8b

test(dogstatsd): rename test to be specific; add comment on container…

00cd042

…_ids absence

chore: add changelog entry for origin detection extension fields

2c774b3

chore: simplify char split pattern in test

9047174

thieman marked this pull request as ready for review May 7, 2026 18:47

thieman requested a review from a team as a code owner May 7, 2026 18:47

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

fix(dogstatsd): skip RNG call in choose_or_not_ref when pool is empty

0e4ab32

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

thieman commented May 7, 2026

View reviewed changes

thieman mentioned this pull request May 7, 2026

chore(deps): update lading-payload to latest and upgrade millstone rand 0.9 → 0.10 DataDog/saluki#1609

Merged

2 tasks

blt approved these changes May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dogstatsd): add configurable pools for origin detection extension fields#1871

feat(dogstatsd): add configurable pools for origin detection extension fields#1871
thieman wants to merge 9 commits intomainfrom
thieman/dogstatsd-origin-detection-extension-fields

thieman commented May 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

thieman May 7, 2026

Uh oh!

thieman May 7, 2026

Uh oh!

blt May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let external_data = choose_or_not_ref(&mut rng, &self.external_data).map(String::as_str);
		let cardinality = choose_or_not_ref(&mut rng, &self.cardinality).map(String::as_str);

Conversation

thieman commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New fields

Fuzz corpus note

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

thieman May 7, 2026

Choose a reason for hiding this comment

Uh oh!

thieman May 7, 2026

Choose a reason for hiding this comment

Uh oh!

blt May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thieman commented May 7, 2026 •

edited

Loading