Skip to content

ENG-115918: Use Retry-After header for rate limits, add retry to all AC endpoints#79

Merged
dmeenaarmorcode merged 1 commit into
qafrom
ENG-115918-retry-after-header
Apr 9, 2026
Merged

ENG-115918: Use Retry-After header for rate limits, add retry to all AC endpoints#79
dmeenaarmorcode merged 1 commit into
qafrom
ENG-115918-retry-after-header

Conversation

@dmeenaarmorcode
Copy link
Copy Markdown
Collaborator

Summary

Fixes ENG-115918

Changes

New retry infrastructure

  • retry_request(func, max_retries=5, max_server_retries=3) — unified retry wrapper
    • 429: reads X-Rate-Limit-Retry-After-Seconds header (capped at 300s), random 0-10s for concurrent limit errors, 2s fallback
    • 5XX: exponential backoff min(5×2ⁿ, 120s) + jitter
    • Network errors: same exponential backoff as 5XX
    • Uses gevent.sleep() throughout
  • get_retry_delay(response) — extracts delay from response header/body
  • is_concurrent_limit_error(response) — detects concurrent limit vs standard rate limit

Endpoint fixes

Function Endpoint Before After
process() get-task No 429 handling, >500 missed status 500 429 with header delay, fixed to >=500
update_task() put-result Fixed 2s sleep, recursive retry (max 3) retry_request() with header delay
upload_response() upload-result No retry retry_request()
check_for_logs_fetch() upload-logs No retry retry_request()
get_s3_upload_url() upload-url No retry retry_request()
upload_s3() S3 presigned PUT No retry retry_request(max_retries=0) (5XX only)

Testing

  • Syntax verified: python3 -c "import ast; ast.parse(...)"
  • No changes to metrics logging, RateLimiter, or target request (customer on-prem)

…AC endpoints

- Add retry_request() unified retry wrapper: 429 uses X-Rate-Limit-Retry-After-Seconds
  header (cap 300s), 5XX uses exponential backoff (5s base, 120s cap + jitter)
- Add is_concurrent_limit_error() and get_retry_delay() helpers
- Fix process() get-task: add 429 handling, fix '>500' to '>=500'
- Refactor update_task(): replace recursive retry + fixed 2s sleep with retry_request()
- Add retry to upload_response() (upload-result): was zero retry
- Add retry to check_for_logs_fetch() (upload-logs): was zero retry
- Add retry to get_s3_upload_url() (upload-url): was zero retry
- Add retry to upload_s3() (S3 PUT): 5XX backoff only (max_retries=0)
@dmeenaarmorcode dmeenaarmorcode changed the base branch from main to qa April 9, 2026 14:49
@dmeenaarmorcode dmeenaarmorcode merged commit c67c331 into qa Apr 9, 2026
4 checks passed
@dmeenaarmorcode dmeenaarmorcode deleted the ENG-115918-retry-after-header branch April 9, 2026 14:50
dmeenaarmorcode added a commit that referenced this pull request Apr 10, 2026
…AC endpoints (#79) (#80)

- Add retry_request() unified retry wrapper: 429 uses X-Rate-Limit-Retry-After-Seconds
  header (cap 300s), 5XX uses exponential backoff (5s base, 120s cap + jitter)
- Add is_concurrent_limit_error() and get_retry_delay() helpers
- Fix process() get-task: add 429 handling, fix '>500' to '>=500'
- Refactor update_task(): replace recursive retry + fixed 2s sleep with retry_request()
- Add retry to upload_response() (upload-result): was zero retry
- Add retry to check_for_logs_fetch() (upload-logs): was zero retry
- Add retry to get_s3_upload_url() (upload-url): was zero retry
- Add retry to upload_s3() (S3 PUT): 5XX backoff only (max_retries=0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant