feat: add model authenticity verification#2550
feat: add model authenticity verification#2550Arlenman wants to merge 4 commits intofarion1231:mainfrom
Conversation
|
Hi maintainers, this PR adds a model authenticity verification page with protocol probes, |
|
Claude finished @farion1231's task in 2m 20s —— View job Review of PR #2550 — Model Authenticity VerificationNo blocking issues. A couple of i18n-localization nits where Chinese strings are hardcoded in user-visible UI paths. 🟡 Nit — Hardcoded Chinese fallback in
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a7e3d674b3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if response_type == Some("message") && !content.is_empty() { | ||
| (ProbeStatus::Passed, "Anthropic Messages 响应结构可解析".to_string(), excerpt) | ||
| } else { | ||
| (ProbeStatus::Warning, "Anthropic 响应结构不完整".to_string(), excerpt) |
There was a problem hiding this comment.
Validate Anthropic probes against expected outputs
This probe marks knowledgeQa, modelFeatures, and protocolConsistency as passed whenever the response has type message and non-empty text, regardless of whether the model actually returned 579, valid JSON, or OK for the prompt. That means incorrect Anthropic backends can still receive high confidence scores, which undermines the feature’s core authenticity signal.
Useful? React with 👍 / 👎.
| if has_candidates && !content.is_empty() { | ||
| (ProbeStatus::Passed, "Gemini generateContent 响应结构可解析".to_string(), excerpt) | ||
| } else { | ||
| (ProbeStatus::Warning, "Gemini 响应结构不完整".to_string(), excerpt) |
There was a problem hiding this comment.
Validate Gemini probes against expected outputs
The Gemini path uses the same pass condition for all main probes (has_candidates and non-empty content), so arithmetic/JSON/protocol probes can pass even when the returned content is wrong. In practice this can overstate confidence and hide protocol/model mismatches for Gemini endpoints.
Useful? React with 👍 / 👎.
| let latency_ms = probes.iter().filter_map(|probe| probe.latency_ms).max(); | ||
| let latency_seconds = latency_ms.map(|value| value as f64 / 1000.0); |
There was a problem hiding this comment.
Compute total latency instead of max single-probe latency
Metrics currently set latency from the maximum probe duration, but the UI labels this as overall latency and uses it to derive tokens/sec. Using max rather than end-to-end (or summed) time underreports total runtime and inflates throughput numbers, producing misleading diagnostics.
Useful? React with 👍 / 👎.
Summary
collapse display
Verification