Skip to content

feat(dogstatsd): add configurable pools for origin detection extension fields#1871

Open
thieman wants to merge 9 commits intomainfrom
thieman/dogstatsd-origin-detection-extension-fields
Open

feat(dogstatsd): add configurable pools for origin detection extension fields#1871
thieman wants to merge 9 commits intomainfrom
thieman/dogstatsd-origin-detection-extension-fields

Conversation

@thieman
Copy link
Copy Markdown

@thieman thieman commented May 7, 2026

Summary

  • Adds three new optional Config fields to the DogStatsD payload generator, each holding a pool of possible values for one of the origin detection protocol extension fields
  • Per generated metric, a value is randomly selected from the pool or the field is omitted entirely — consistent with the existing behavior of |c: with its internal random pool
  • Because millstone's Payload::DogStatsD wraps lading_payload::dogstatsd::Config directly via serde, all three fields are immediately usable in millstone's YAML configuration with no changes to millstone required
  • No breaking changes: all existing Config constructions use ..Default::default() or Config::default(); the metric structs cannot be externally constructed due to pub(crate) fields; internal constructors are all crate-private
  • Fixes choose_or_not_ref to short-circuit before touching the RNG when the pool is empty, preserving determinism for seeded output when optional fields are unconfigured

New fields

container_ids — pool for the |c: Local Data field. When empty, falls back to the prior behavior of using an internally generated random value. When non-empty, only values from the pool are used, so callers can supply real container IDs.

external_data — pool for the |e: External Data field. Empty by default; field is never emitted when empty.

cardinality — pool for the |card: cardinality override field. Empty by default; field is never emitted when empty.

Fuzz corpus note

Config derives arbitrary::Arbitrary and is used as a fuzz input in two targets. Adding fields changes what the fuzzer generates, so saved fuzz corpora for those targets should be reset.

Test plan

  • All existing lading-payload tests pass
  • New tests verify fields are emitted when pools are non-empty, absent when empty, and that emitted values come only from the configured pool
  • Probabilistic emission tests run 100 seeds (p(false negative) ≈ 10^-30)

🤖 Generated with Claude Code

thieman and others added 8 commits May 7, 2026 12:35
…n fields

Adds three new optional fields to `Config` that control the DogStatsD origin
detection extension fields (`|c:`, `|e:`, `|card:`). Each field takes a pool
of possible string values; per generated metric, one value is randomly selected
from the pool or the field is omitted entirely (matching the existing behavior
of `|c:` with its internal random pool).

- `container_ids`: pool for the `|c:` Local Data field. When empty, falls back
  to the existing behavior of using a randomly generated value. When non-empty,
  only values from the pool are used, enabling callers to supply real container
  IDs rather than random strings.

- `external_data`: pool for the `|e:` External Data field. Format per value:
  `pu-<pod_uid>,cn-<container_name>,it-<bool>`. Empty by default (field never
  emitted).

- `cardinality`: pool for the `|card:` cardinality override field. Valid values
  are `low`, `orchestrator`, `high`, `none`. Empty by default (field never
  emitted).

Because millstone's `Payload::DogStatsD` wraps `lading_payload::dogstatsd::Config`
directly via serde, all three fields are immediately available in millstone's
YAML configuration with no changes required to millstone itself.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@thieman thieman marked this pull request as ready for review May 7, 2026 18:47
@thieman thieman requested a review from a team as a code owner May 7, 2026 18:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9047174329

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +121 to +122
let external_data = choose_or_not_ref(&mut rng, &self.external_data).map(String::as_str);
let cardinality = choose_or_not_ref(&mut rng, &self.cardinality).map(String::as_str);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve default fingerprints or update goldens

With the default dogstatsd: {} config used by ci/fingerprints/dogstatsd/lading.yaml, both new pools are empty, but choose_or_not_ref still consumes a random boolean before returning None for an empty slice. That burns two extra RNG draws for every generated metric, changing the default payload stream while ci/fingerprints/dogstatsd/fingerprint.txt was not updated; I checked .github/workflows/ci.yml and the fingerprint job runs ci/fingerprint, so this PR will fail that CI job unless the empty-pool path avoids RNG consumption or the dogstatsd fingerprints are intentionally regenerated.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e4ab32f58

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

///
/// When non-empty, each generated metric randomly selects from this pool (or omits the field).
/// When empty, the field is never emitted.
pub cardinality: Vec<String>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate configured cardinality values

When cardinality contains a typo or any value outside the DogStatsD v1.6 set (none, low, orchestrator, high), Config::valid still accepts the config and the metric serializers emit |card:<value> verbatim. In that scenario lading produces invalid cardinality override datagrams instead of failing fast like it does for other bounded config fields; the Datadog format documents the allowed values at https://docs.datadoghq.com/extend/dogstatsd/datagram_shell/#dogstatsd-protocol-v16.

Useful? React with 👍 / 👎.

{
if pool.is_empty() {
return None;
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to prevent advancing the RNG if the pool is empty. This was failing fingerprint tests because our new external_data and cardinality fields default to empty. This change does not affect any of the other three existing usages of choose_or_not_ref because they were never called with empty pools

// set of differently-seeded runs. Each metric has a 50% chance of including the field, so
// the probability of never seeing it across N independent runs is 0.5^N (~10^-30 for N=100).
fn assert_field_emitted_at_least_once(config: &Config, field_prefix: &[u8]) {
let seen = (0..100).any(|seed| {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a flakeable test but if anyone ever sees it fail they should go out and buy a lottery ticket. Can remove it if that's still not okay.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy the analysis here, no concerns.

gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/saluki that referenced this pull request May 7, 2026
…nd 0.9 → 0.10 (#1609)

## Human Summary

Bumps our version of `lading` up to bring in the upgrade to `rand` from DataDog/lading#1865

Subsequent PRs will bump `lading` further to bring in some updates I'm working on (see DataDog/lading#1871), just wanted to get this yak out of the way independently

## Summary

- Updates `lading-payload` git rev to `01ef8d70`, picking up the rand 0.9 → 0.10 upgrade in lading (#1865) and other upstream improvements
- Updates millstone's pinned `rand = "0.9"` to the workspace `rand = "0.10"` to match lading-payload's expectation
- Fixes two `corpus.rs` call sites that broke with the new lading API: `DogStatsD::new` now takes `&Config` instead of `Config`, and the internal generator functions are de-genericized from `R: Rng` to concrete `StdRng`

The rand 0.9 → 0.10 upgrade in rand_core changes how `Rng` is implemented: `rand_core 0.10` only blanket-implements `Rng` for types satisfying `TryRng<Error=Infallible>` via `DerefMut`, which concrete RNGs satisfy through direct impls in the rand crate. Mixing a rand 0.9 `StdRng` with a rand 0.10 trait boundary produced opaque `DerefMut` errors. The fix is to use the same rand version throughout.

## Test plan

- [ ] `cargo check --workspace` passes
- [ ] `cargo clippy -p millstone` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: travis.thieman <travis.thieman@datadoghq.com>
dd-octo-sts Bot pushed a commit to DataDog/saluki that referenced this pull request May 7, 2026
…nd 0.9 → 0.10 (#1609)

## Human Summary

Bumps our version of `lading` up to bring in the upgrade to `rand` from DataDog/lading#1865

Subsequent PRs will bump `lading` further to bring in some updates I'm working on (see DataDog/lading#1871), just wanted to get this yak out of the way independently

## Summary

- Updates `lading-payload` git rev to `01ef8d70`, picking up the rand 0.9 → 0.10 upgrade in lading (#1865) and other upstream improvements
- Updates millstone's pinned `rand = "0.9"` to the workspace `rand = "0.10"` to match lading-payload's expectation
- Fixes two `corpus.rs` call sites that broke with the new lading API: `DogStatsD::new` now takes `&Config` instead of `Config`, and the internal generator functions are de-genericized from `R: Rng` to concrete `StdRng`

The rand 0.9 → 0.10 upgrade in rand_core changes how `Rng` is implemented: `rand_core 0.10` only blanket-implements `Rng` for types satisfying `TryRng<Error=Infallible>` via `DerefMut`, which concrete RNGs satisfy through direct impls in the rand crate. Mixing a rand 0.9 `StdRng` with a rand 0.10 trait boundary produced opaque `DerefMut` errors. The fix is to use the same rand version throughout.

## Test plan

- [ ] `cargo check --workspace` passes
- [ ] `cargo clippy -p millstone` passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: travis.thieman <travis.thieman@datadoghq.com> f91e8aa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants