Skip to content

Antalya 26.3: Fix rescheduleTasksFromReplica#1747

Merged
zvonand merged 1 commit into
antalya-26.3from
feature/antalya-26.3/pr-1568
May 7, 2026
Merged

Antalya 26.3: Fix rescheduleTasksFromReplica#1747
zvonand merged 1 commit into
antalya-26.3from
feature/antalya-26.3/pr-1568

Conversation

@zvonand
Copy link
Copy Markdown
Member

@zvonand zvonand commented May 6, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix rescheduleTasksFromReplica (#1568 by @ianton-ru).

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Cherry-picked from #1568.


Documentation entry for user-facing changes

Fix incorrect change from c523f29
getReplicaForFile uses replica_to_files_to_be_processed to find best replica for file, With removing lost replica after getReplicaForFile call, getReplicaForFile chooses the same replica, so rescheduling makes no sense, files will be choosen only in getAnyUnprocessedFile and executed on random replicas.
This PR fixes the order, now files are matched with new best replicas.

…rash

Antalya 26.1: Fix rescheduleTasksFromReplica
@zvonand zvonand added the releasy Created/managed by RelEasy label May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Workflow [PR], commit [c2744ab]

@zvonand zvonand merged commit e0e1572 into antalya-26.3 May 7, 2026
293 of 313 checks passed
@zvonand zvonand added forwardport This is a frontport of code that existed in previous Antalya versions port-antalya PRs to be ported to all new Antalya releases labels May 7, 2026
@alsugiliazova
Copy link
Copy Markdown
Member

Verification report: Altinity/ClickHouse PR #1747


Conclusion

PR is merged. PR test workflow is clean at the test level on head; only the chronic antalya-26.3 regression suites are red, all at baseline rates. No PR-caused regression found. The diff is 3 lines in rescheduleTasksFromReplica — minimal blast radius.

Caveat — partial frontport (same as the rest of the 26.3 cohort). Companion antalya-26.1 frontports are still missing on antalya-26.3; chronic regression failures here are branch-level missing-dependency symptoms. Iceberg sort key timezone continues to fail with the same UNRECOGNIZED_ARGUMENTS: '--iceberg_partition_timezone' from a binary that doesn't expose the option.


CI on head c2744abe — failures

PR test workflow

0 test-level FAILs in gh-data.checks on head SHA. All PR test jobs that run tests are green at the test level. (45 success / 50 skipped / 5 failure in the workflow rollup is dominated by Build/Workflow housekeeping, no actual test breakage.)

Regression workflow (10 failed checks)

Check Top failing tests on PR-1747 builds (30d) Baseline (antalya-26.3, 30d) Class
Swarms (Release + Aarch64) swarm joins / join clause, cluster discovery / multiple paths, node failure / network failure, node failure / cpu overload, swarm join sanity / join with clause (×2 each) 30–44% on every PR Pre-existing broken
S3Export (partition) (Release + Aarch64) sanity / no partition by (×2) 50% Pre-existing broken
Iceberg (1) (Release + Aarch64) rest catalog / sort key timezone / day transform utc (×2), rest catalog / iceberg iterator race condition (×2) 41% / 28% Missing-dep + pre-existing flaky
Iceberg (2) (Release + Aarch64) chronic glue-catalog / race-condition variants chronic Pre-existing flaky
Parquet (Release + Aarch64) postgresql/mysql round-trip compression-type variants (×2 each) ~36% Pre-existing flaky

Regression DB on /PRs/1747/ builds (30d): 152 Fail / 5,358 OK ≈ 2.8%. Every top failure matches the all-PR baseline fail rate on antalya-26.3.


Related to PR diff?

PR is a 3-line fix in rescheduleTasksFromReplica (1 file, replicated-task scheduling path).

Failing test Diff overlap Related?
swarms / *, parquet / *, s3_export_partition / *, iceberg / * none — none of these suites exercise rescheduleTasksFromReplica No

No failing test intersects the rescheduling code path.


Recommendations

  1. No action on this PR. Merged and effectively clean — PR test workflow is green at the test level, and the regression failures are 100% chronic baseline.
  2. Re-verify after the companion 26.1 → 26.3 frontports land — same list as the prior 26.3 verification reports.
  3. Same chronic-baseline cleanup recommendation as VERIFICATION_PR_1640.md for swarms / parquet / s3_export_partition / iceberg scenarios.

Local checkout

cd /Users/alsugilyazova/workspace/altinity-clickhouse/ClickHouse
gh pr checkout 1747 --repo Altinity/ClickHouse
# HEAD: c2744abe886243da0b4e82745ba22dbd7c198c27

@alsugiliazova
Copy link
Copy Markdown
Member

Audit: PR #1747 — Antalya 26.3: Fix rescheduleTasksFromReplica

AI audit note: This review comment was generated by AI (Cursor agent, audit-review skill).

Audit update for PR #1747 (rescheduleTasksFromReplica ordering)

Confirmed defects

No confirmed defects in reviewed scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.3 antalya-26.3.10.20001 forwardport This is a frontport of code that existed in previous Antalya versions port-antalya PRs to be ported to all new Antalya releases releasy Created/managed by RelEasy verified Approved for release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants