Add dda inv github.bump-rshell task for automated rshell version bumps#49556
Add dda inv github.bump-rshell task for automated rshell version bumps#49556
Conversation
Files inventory check summaryFile checks results against ancestor 699f83c7: Results for datadog-agent_7.80.0~devel.git.30.94f2519.pipeline.108343478-1_amd64.deb:No change detected |
fceb75a to
6e9b20f
Compare
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 699f83c Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +1.01 | [-1.99, +4.01] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | quality_gate_logs | % cpu utilization | +1.99 | [+0.29, +3.69] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs | memory utilization | +1.79 | [+1.55, +2.03] | 1 | Logs bounds checks dashboard |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +1.09 | [+0.91, +1.27] | 1 | Logs |
| ➖ | docker_containers_cpu | % cpu utilization | +1.01 | [-1.99, +4.01] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | +0.54 | [+0.36, +0.71] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | +0.53 | [+0.47, +0.59] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.50 | [+0.44, +0.55] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | +0.32 | [+0.26, +0.37] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.17 | [+0.10, +0.23] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.10 | [-0.32, +0.53] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.02 | [-0.38, +0.41] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.01 | [-0.12, +0.14] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.00 | [-0.22, +0.22] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.00 | [-0.12, +0.12] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.00 | [-0.22, +0.22] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.01 | [-0.23, +0.22] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.02 | [-0.16, +0.13] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | -0.06 | [-0.61, +0.50] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | -0.15 | [-0.26, -0.04] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | -0.16 | [-0.34, +0.02] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.28 | [-0.43, -0.13] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | -0.35 | [-0.38, -0.32] | 1 | Logs bounds checks dashboard |
| ➖ | docker_containers_memory | memory utilization | -0.47 | [-0.55, -0.39] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 704 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 274.81MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 677 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.19GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.24GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.22GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 4 = 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 174.77MiB ≤ 181MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 4 = 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 506.39MiB ≤ 550MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 209.47MiB ≤ 220MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 362.79 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 430.57MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94f2519750
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ctx.run(f"go get {RSHELL_MODULE}@{version}") | ||
| ctx.run("dda inv tidy") | ||
|
|
||
| ctx.run("git add -A") |
There was a problem hiding this comment.
Exclude untracked files before staging all changes
The preflight guard calls check_uncommitted_changes(ctx), which only checks tracked diffs (git --no-pager diff --name-only HEAD) and does not catch untracked files, but this task later runs git add -A. In any non-fresh workspace, untracked files will be silently staged and committed into the bump PR, so unrelated artifacts (including sensitive files) can be pushed by accident.
Useful? React with 👍 / 👎.
| set_gitconfig_in_ci(ctx) | ||
| # `-C` (capital) creates or resets; handles a stale local branch from a | ||
| # prior partial run without needing a separate precondition check. | ||
| ctx.run(f"git switch -C {branch}", hide=True) |
There was a problem hiding this comment.
Base the bump branch on main before committing
This creates bump-rshell-* from whatever commit is currently checked out (git switch -C {branch}) and then opens a PR targeting main, but there is no guard that the starting point is actually main (or origin/main). If someone runs the task from a feature/release branch, the generated PR will include unrelated commits in addition to the rshell bump.
Useful? React with 👍 / 👎.
What does this PR do?
Adds a new
dda inv github.bump-rshelltask that bumpsgithub.com/DataDog/rshellingo.mod, regeneratesgo.sumviadda inv tidy, writes a reno release note, and opens a draft PR — all in one invocation.Typical usage, from DataDog/rshell's GitLab pipeline (after cloning this repo and running
dda self dep sync -f legacy-tasks):Motivation
Today every rshell release requires a hand-crafted PR to bump the pinned version. Prior iteration (
DataDog/rshell#188) implemented the flow as a standalone Python script inside the rshell repo. This PR moves the logic here, whereGithubAPI,check_clean_branch_state,set_gitconfig_in_ci, and reno tooling already exist — matching the idiomatic Datadog pattern used byupdate_windows_runner_version,update_collector_dependencies, etc. rshell's pipeline becomes a thin wrapper.Design highlights
--version, queries DataDog/rshell's tags via PyGithub (reusingGithubAPI) and picks the highest strict semver tag. Numeric tuple sort, sov0.0.10 > v0.0.9. Pre-releases (v1.0.0-rc1) are excluded.state="all":GithubAPIinternalizes$GITHUB_TOKEN, the task callsos.environ.pop("GITHUB_TOKEN", None)so child processes (go,git,dda inv tidy) can't inherit the write-scoped credential. Forgit push, the token is wired into the remote URL viagit remote set-urlwithhide=True, mirroringtasks/pipeline.py:693.go mod edit -json+go mod edit -dropreplace(Go's own tools) to handle every valid form (single-line, versioned single-line, block-form with/without version) uniformly.bump-rshell-vX.Y.Z(bot-exclusive); the task force-pushes so reruns after a partial failure self-recover.Describe how you validated your changes
dda inv invoke-unit-tests.run --tests github_tasks— 29 tests pass, including 10 new cases covering auto-discovery (numeric sort, pre-release exclusion, no-tags-found), explicit-version validation, open/closed/merged PR classification, already-pinned no-op, token scrubbing, and the happy path end-to-end with mocked subprocess.dda inv linter.python— clean (ruff, vulture, mypy).Possible Drawbacks / Trade-offs
GithubAPIinstance (against rshell) for tag discovery. rshell is public, so the existingDataDog/datadog-agent-scoped token works fine; the extra calls are well within API limits.dda,go,reno, andgitare on PATH — which is true in thedatadog-agent-buildimages/linuximage that rshell's CI uses.Additional Notes
Keep as draft. This lands alongside:
DataDog/datadog-agent#49490— thedd-octo-stspolicy that authorizes rshell's GitLab pipeline to mint a token for this task.DataDog/rshell#188— will be simplified in a follow-up commit to delete the standalone Python implementation and call this task instead.