Commit 7455f91
[Evaluation] Fix red team status tracking, cache key mismatch, and evaluation error handling (#45517)
* Fix red team status tracking, cache key mismatch, and evaluation error handling
Bug 1 - Status tracking: _determine_run_status now treats 'pending' and
'running' entries as 'failed' instead of 'in_progress'. By the time this
method runs the scan is finished, so leftover 'pending' entries (from
skipped risk categories or Foundry execution failures) indicate failure,
not ongoing work.
Bug 2 - Cache key mismatch: _execute_attacks_with_foundry now uses
get_attack_objective_from_risk_category() to build the cache lookup key,
matching the caching logic in _get_attack_objectives. Previously,
ungrounded_attributes objectives were cached under 'isa' but looked up
under 'ungrounded_attributes', causing them to be silently skipped.
Bug 3 - Evaluation error handling: RAIServiceScorer now detects when the
RAI evaluation service returns an error response (properties.outcome ==
'error', e.g. ServiceInvocationException) and raises RuntimeError. This
causes PyRIT to treat the score as UNDETERMINED instead of using the
erroneous passed=False to incorrectly mark the attack as successful,
which was inflating ASR.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add changelog entries for status tracking, cache key, and scoring fixes
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 2d0a9aa commit 7455f91
4 files changed
Lines changed: 25 additions & 5 deletions
File tree
- sdk/evaluation/azure-ai-evaluation
- azure/ai/evaluation/red_team
- _foundry
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
7 | 10 | | |
8 | 11 | | |
9 | 12 | | |
| |||
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
183 | 183 | | |
184 | 184 | | |
185 | 185 | | |
| 186 | + | |
| 187 | + | |
186 | 188 | | |
187 | 189 | | |
188 | 190 | | |
| |||
197 | 199 | | |
198 | 200 | | |
199 | 201 | | |
| 202 | + | |
200 | 203 | | |
201 | 204 | | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
202 | 218 | | |
203 | 219 | | |
204 | 220 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1750 | 1750 | | |
1751 | 1751 | | |
1752 | 1752 | | |
1753 | | - | |
| 1753 | + | |
1754 | 1754 | | |
1755 | 1755 | | |
1756 | 1756 | | |
| |||
Lines changed: 5 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1493 | 1493 | | |
1494 | 1494 | | |
1495 | 1495 | | |
1496 | | - | |
| 1496 | + | |
| 1497 | + | |
| 1498 | + | |
| 1499 | + | |
1497 | 1500 | | |
1498 | 1501 | | |
1499 | 1502 | | |
| |||
1502 | 1505 | | |
1503 | 1506 | | |
1504 | 1507 | | |
1505 | | - | |
| 1508 | + | |
1506 | 1509 | | |
1507 | | - | |
1508 | | - | |
1509 | 1510 | | |
1510 | 1511 | | |
1511 | 1512 | | |
| |||
0 commit comments