Skip to content

chore(java): retry failed workflow#2229

Merged
sharmabikram merged 3 commits intomainfrom
shbikram/retryFailedDailyCI
Apr 21, 2026
Merged

chore(java): retry failed workflow#2229
sharmabikram merged 3 commits intomainfrom
shbikram/retryFailedDailyCI

Conversation

@sharmabikram
Copy link
Copy Markdown
Contributor

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@sharmabikram sharmabikram requested a review from a team as a code owner April 20, 2026 17:14
@sharmabikram sharmabikram changed the title (chore: java) retry failed workflow chore(java): retry failed workflow Apr 20, 2026
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blanket-retrying all failed jobs can silently mask real failures — a test that fails due to a genuine bug but passes on retry due to non-determinism would go unnoticed. I'd prefer we either scope retries to known-transient failure patterns (dependency resolution, docker pulls, credential issues) or evaluate each test to see if each of the test is non-deterministic and add retry for those test. As-is, this optimizes for green CI at the cost of CI trustworthiness.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next revision adds skip list functionality which can be used to not retry a given suite. Whenever we have fuzz tests in this library, we shall use that skip list.
Except for dedicated fuzz tests, retry should be fine for general use.

Copy link
Copy Markdown
Member

@rishav-karanjit rishav-karanjit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved it.

For the records: I'd prefer an allowlist (opt in retries) over a denylist (opt out retries). The yml does a sub string matching assuming 'fuzz' will be included in job name which is fragile. The current logic is if a unit test and a fuzz test fails, non of them will be retried. I prefer retrying unit test and skipping fuzz test.

// Jobs that should NOT be retried. These are non-deterministic tests
// (e.g., fuzz tests) where a retry could mask a real failure.
// Use job name prefixes/substrings to match.
const skipPatterns = [
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes non deterministic job name will always include "fuzz". String matching on job name is too fragile.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer an allowlist (opt in retries) over a denylist (opt out retries).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are in situation where a any random test fails in first run and passes in second run masking the bug for the first run, we are doomed. This behavior MUST always be an exception or an explicit choice where we add fuzz tests. The skip list are generally made for exception cases and which is what is implemented here.

);
});

if (skipped.length > 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a unit test and a fuzz test fails, non of them will be retried. I prefer retrying unit test and skipping fuzz test.

@sharmabikram sharmabikram merged commit 2668d68 into main Apr 21, 2026
128 of 130 checks passed
rishav-karanjit added a commit that referenced this pull request Apr 29, 2026
## [4.0.0](v3.9.0-java...v4.0.0-java) (2026-04-29)

### ⚠ BREAKING CHANGES

* **java:** add DDBEC with SDK v2 and remove DDBEC with SDK V1 (#2048)
  * The AWS Database Encryption SDK for DynamoDB will NOT support AWS SDK for Java 1.x in the embedded 2.x version (which was known as DynamoDB Encryption Client(DDBEC)). The embedded DDBEC will now use AWS SDK for Java 2.x. These changes are limited to embedded DDBEC and rest of DB-ESDK has no code changes.
  * If consumer of DB-ESDK is using APIs from namespace `com.amazonaws.services.dynamodbv2` through DB-ESDK then they have to migrate to use corresponding API from namespace `com.amazonaws.services.dynamodbv2.datamodeling.sdkv2`. If consumer of DB-ESDK are NOT using namespace `com.amazonaws.services.dynamodbv2` through DB-ESDK then there will not be any breaking change when moving to next major version.

### Features -- Java

* **java:** add DDBEC with SDK v2 and remove DDBEC with SDK V1 ([#2048](#2048)) ([035dbe3](035dbe3))

### Fixes -- All Languages

* use UUIDs and cleanup in beacon styles example to avoid stale item collisions ([#2125](#2125)) ([773c1ff](773c1ff))

### Fixes -- Java

* **java:** drop hkdf offset method ([#2011](#2011)) ([b8f29f9](b8f29f9))

### Maintenance -- All Languages

* **dafny:** bump MPL and update mutable map ([#1974](#1974)) ([e9ea870](e9ea870))
* **dafny:** bump smithy dafny  ([#1971](#1971)) ([85309a0](85309a0))

### Maintenance -- Java

* **java:** Allow local testing ([#1947](#1947)) ([bf5a106](bf5a106))
* **java:** Attempt to reduce flaky CI ([#2220](#2220)) ([987aec6](987aec6))
* **java:** attempt to reduce flaky CI failures ([#2203](#2203)) ([b4d88f1](b4d88f1))
* **java:** bring back test against released MPL version ([#2226](#2226)) ([a340b34](a340b34))
* **java:** fix GetEncryptedDataKeyDescription java Example  ([#1973](#1973)) ([ba8fcb7](ba8fcb7))
* **java:** retry failed workflow ([#2229](#2229)) ([2668d68](2668d68))
* **java:** shut down local DDB in test ([#2176](#2176)) ([fa1e151](fa1e151))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants