fix(firmware): raise bootloader HID timeouts 10s to 30s (resolves daqifi-desktop#575)#235
fix(firmware): raise bootloader HID timeouts 10s to 30s (resolves daqifi-desktop#575)#235cptkoolbeenz wants to merge 1 commit into
Conversation
…op #575)
In-app PIC32 firmware update intermittently fails mid-Programming with
'IOException: HID write failed'. USBPcap capture: ~9213 records program
fine, then one late HID write goes ~10s without the device accepting it
(twice = the two ExecuteWithRetryAsync attempts) and gives up. The
bootloader is unchanged and flashes fine from desktop v3.0.0, whose old
synchronous flasher used unbounded blocking HID I/O — so a late bootloader
flash op that keeps the endpoint busy >10s was tolerated there but trips
the new transport's 10s caps.
Raise BootloaderResponseTimeout and HidLibraryTransport Default{Read,Write}
Timeout from 10s to 30s so a single attempt waits the op out (the overall
ProgrammingTimeout stays 10min). If this alone doesn't resolve #575, the
remaining lever is matching the old per-record synchronous lock-step
(avoid cancel+retry mid-op).
Refs daqifi/daqifi-desktop#575.
Code Review by Qodo
1. Broken HID timeout test
|
|
Closing — opened this prematurely. Filing the change as a tried-fix suggestion on daqifi/daqifi-desktop#575 instead of a PR, per maintainer preference. |
| // 30 s (was 10 s): a late bootloader flash op can leave the device's HID | ||
| // endpoint busy/NAKing well past 10 s, which surfaced here as | ||
| // "IOException: HID write failed" mid-programming (daqifi-desktop #575). | ||
| // The old synchronous desktop flasher used unbounded blocking HID I/O and | ||
| // tolerated this; restore headroom so a single attempt waits the op out. | ||
| private static readonly TimeSpan DefaultReadTimeout = TimeSpan.FromSeconds(30); | ||
| private static readonly TimeSpan DefaultWriteTimeout = TimeSpan.FromSeconds(30); |
There was a problem hiding this comment.
1. Broken hid timeout test 🐞 Bug ≡ Correctness
Changing HidLibraryTransport defaults to 30s breaks HidLibraryTransportTests, which still asserts 10s defaults, causing CI/unit test failure. This is deterministic and will block merges if tests are enforced.
Agent Prompt
### Issue description
`HidLibraryTransport` default read/write timeouts were changed from 10s to 30s, but the unit test still asserts 10s defaults.
### Issue Context
This will cause the test suite to fail.
### Fix Focus Areas
- src/Daqifi.Core.Tests/Communication/Transport/HidLibraryTransportTests.cs[8-16]
- src/Daqifi.Core/Communication/Transport/HidLibraryTransport.cs[10-26]
### Suggested fix
- Update the assertions to expect 30 seconds for both `ReadTimeout` and `WriteTimeout`.
- (Optional) Rename the test to reflect the new expectation (e.g., `...IsThirtySeconds`).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| /// 30 s (was 10 s): a late bootloader flash op can keep the device busy | ||
| /// past 10 s, which timed out mid-programming (daqifi-desktop #575). | ||
| /// </summary> | ||
| public TimeSpan BootloaderResponseTimeout { get; set; } = TimeSpan.FromSeconds(10); | ||
| public TimeSpan BootloaderResponseTimeout { get; set; } = TimeSpan.FromSeconds(30); |
There was a problem hiding this comment.
2. Connecting timeout too short 🐞 Bug ☼ Reliability
BootloaderResponseTimeout is now 30s, but bootloader version probing runs under the Connecting state timeout (default 20s), so the operation can still be canceled by the outer state timeout before the new bootloader read/write timeouts can complete. This makes the new default effectively unreachable in the Connecting phase and can still fail on slow/busy HID endpoints.
Agent Prompt
### Issue description
`BootloaderResponseTimeout` was increased to 30 seconds, but `RequestBootloaderVersionAsync` is executed inside `ExecuteWithStateTimeoutAsync(FirmwareUpdateState.Connecting)` where the default `ConnectingTimeout` is 20 seconds. The outer state timeout can cancel the operation before the configured bootloader timeouts are reached.
### Issue Context
- `RequestBootloaderVersionAsync` performs a HID write and then a read with `_options.BootloaderResponseTimeout`.
- `ExecuteWithStateTimeoutAsync` uses `FirmwareUpdateServiceOptions.GetStateTimeout(state)` to cancel the linked token.
### Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[35-66]
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[313-317]
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[913-921]
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[1260-1277]
### Suggested fix
Choose one:
1) Increase the default `ConnectingTimeout` to be >= the maximum expected duration of the connecting-phase HID round-trip (at least >= `BootloaderResponseTimeout`, and realistically also accounting for the preceding HID write), or
2) Move the bootloader version request out of the `Connecting` state budget into a state whose timeout budget is compatible with the new I/O timeouts.
Also consider auditing other bootloader-I/O states (e.g., erase) to ensure their state budgets can accommodate the new per-op timeouts.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
PR Summary by Qodo
Raise PIC32 bootloader HID timeouts to 30s to prevent mid-flash failures
🐞 Bug fix🕐 10-20 MinutesWalkthroughs
User Description
Problem
In-app PIC32 firmware update intermittently fails mid-
ProgrammingwithSystem.IO.IOException: HID write failed(daqifi/daqifi-desktop#575).Root cause (evidence)
ExecuteWithRetryAsyncattempts) — then gives up. No USB STALL; device stays enumerated.Daqifi.Desktop.Bootloader.Pic32Bootloader, pre-#379) used unbounded blocking HID I/O (WriteReport/FastReadReport), so a late bootloader flash op keeping the endpoint busy >10 s was simply waited out. The new async transport caps each op at 10 s and fails.Fix
Raise bootloader HID timeouts 10 s to 30 s so a single attempt waits the op out:
FirmwareUpdateServiceOptions.BootloaderResponseTimeout10 to 30 s (ack read)HidLibraryTransport.Default{Read,Write}Timeout10 to 30 s (the write is what fired)Overall
ProgrammingTimeoutstays 10 min. Builds clean; no test asserts the literal 10 s default.Verification
Hardware-tested: with the 30 s timeouts the in-app PIC32 update completes (no mid-
ProgrammingHID-write timeout).Follow-up
If a future op exceeds 30 s, the more robust fix is matching the old synchronous lock-step (wait per record; avoid cancel+retry mid-op) rather than only widening the window.
Resolves daqifi/daqifi-desktop#575.
🤖 Generated with Claude Code
AI Description
Diagram
graph TD A["FirmwareUpdateService"] --> B["Update options"] A --> C["IHidTransport"] --> D["HidLibraryTransport"] --> E["HidSharp HID I/O"] --> F{{"PIC32 bootloader"}}High-Level Assessment
The following are alternative approaches to this PR:
1. Adaptive per-op timeout/backoff
2. Restore synchronous lock-step flash loop
3. Expose timeouts as app-level configuration
Recommendation: Raising the defaults to 30s is the lowest-risk fix aligned with the reported USBPcap evidence and preserves the existing 10-minute overall ProgrammingTimeout. Consider follow-up work if failures persist: either reintroduce a synchronous/lock-step flashing strategy (to avoid cancel+retry mid-op) or add an adaptive/backoff timeout policy.
File Changes
Bug fix (2)