chore(java): retry failed workflow#2229
Conversation
There was a problem hiding this comment.
Blanket-retrying all failed jobs can silently mask real failures — a test that fails due to a genuine bug but passes on retry due to non-determinism would go unnoticed. I'd prefer we either scope retries to known-transient failure patterns (dependency resolution, docker pulls, credential issues) or evaluate each test to see if each of the test is non-deterministic and add retry for those test. As-is, this optimizes for green CI at the cost of CI trustworthiness.
There was a problem hiding this comment.
The next revision adds skip list functionality which can be used to not retry a given suite. Whenever we have fuzz tests in this library, we shall use that skip list.
Except for dedicated fuzz tests, retry should be fine for general use.
rishav-karanjit
left a comment
There was a problem hiding this comment.
I approved it.
For the records: I'd prefer an allowlist (opt in retries) over a denylist (opt out retries). The yml does a sub string matching assuming 'fuzz' will be included in job name which is fragile. The current logic is if a unit test and a fuzz test fails, non of them will be retried. I prefer retrying unit test and skipping fuzz test.
| // Jobs that should NOT be retried. These are non-deterministic tests | ||
| // (e.g., fuzz tests) where a retry could mask a real failure. | ||
| // Use job name prefixes/substrings to match. | ||
| const skipPatterns = [ |
There was a problem hiding this comment.
This assumes non deterministic job name will always include "fuzz". String matching on job name is too fragile.
There was a problem hiding this comment.
I'd prefer an allowlist (opt in retries) over a denylist (opt out retries).
There was a problem hiding this comment.
If we are in situation where a any random test fails in first run and passes in second run masking the bug for the first run, we are doomed. This behavior MUST always be an exception or an explicit choice where we add fuzz tests. The skip list are generally made for exception cases and which is what is implemented here.
| ); | ||
| }); | ||
|
|
||
| if (skipped.length > 0) { |
There was a problem hiding this comment.
If a unit test and a fuzz test fails, non of them will be retried. I prefer retrying unit test and skipping fuzz test.
## [4.0.0](v3.9.0-java...v4.0.0-java) (2026-04-29) ### ⚠ BREAKING CHANGES * **java:** add DDBEC with SDK v2 and remove DDBEC with SDK V1 (#2048) * The AWS Database Encryption SDK for DynamoDB will NOT support AWS SDK for Java 1.x in the embedded 2.x version (which was known as DynamoDB Encryption Client(DDBEC)). The embedded DDBEC will now use AWS SDK for Java 2.x. These changes are limited to embedded DDBEC and rest of DB-ESDK has no code changes. * If consumer of DB-ESDK is using APIs from namespace `com.amazonaws.services.dynamodbv2` through DB-ESDK then they have to migrate to use corresponding API from namespace `com.amazonaws.services.dynamodbv2.datamodeling.sdkv2`. If consumer of DB-ESDK are NOT using namespace `com.amazonaws.services.dynamodbv2` through DB-ESDK then there will not be any breaking change when moving to next major version. ### Features -- Java * **java:** add DDBEC with SDK v2 and remove DDBEC with SDK V1 ([#2048](#2048)) ([035dbe3](035dbe3)) ### Fixes -- All Languages * use UUIDs and cleanup in beacon styles example to avoid stale item collisions ([#2125](#2125)) ([773c1ff](773c1ff)) ### Fixes -- Java * **java:** drop hkdf offset method ([#2011](#2011)) ([b8f29f9](b8f29f9)) ### Maintenance -- All Languages * **dafny:** bump MPL and update mutable map ([#1974](#1974)) ([e9ea870](e9ea870)) * **dafny:** bump smithy dafny ([#1971](#1971)) ([85309a0](85309a0)) ### Maintenance -- Java * **java:** Allow local testing ([#1947](#1947)) ([bf5a106](bf5a106)) * **java:** Attempt to reduce flaky CI ([#2220](#2220)) ([987aec6](987aec6)) * **java:** attempt to reduce flaky CI failures ([#2203](#2203)) ([b4d88f1](b4d88f1)) * **java:** bring back test against released MPL version ([#2226](#2226)) ([a340b34](a340b34)) * **java:** fix GetEncryptedDataKeyDescription java Example ([#1973](#1973)) ([ba8fcb7](ba8fcb7)) * **java:** retry failed workflow ([#2229](#2229)) ([2668d68](2668d68)) * **java:** shut down local DDB in test ([#2176](#2176)) ([fa1e151](fa1e151))
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.