fix(http-toolset): reject path-traversal segments in endpoint whitelist (CWE-918/CWE-22)#2007
Conversation
|
❌ Deploy Preview for holmes-docs failed. Why did it fail? →
|
WalkthroughStricter URL validation added to HttpToolset.match_endpoint: only HTTP/HTTPS with netloc allowed; path traversal rejected for '.' and '..' segments (raw and percent-decoded); paths normalized with trailing-slash preservation. New tests cover traversal and non-HTTP scheme rejection. ChangesEnhanced URL Validation and Path Traversal Protection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…2/CWE-918)
The HTTP toolset whitelist matched the raw URL path with fnmatch, but
requests/urllib3 normalise '..' segments before issuing the request.
An LLM-supplied URL such as
https://confluence.example.com/wiki/rest/api/../../admin/secret
would match the configured '/wiki/rest/api/*' allow-list yet actually
hit '/admin/secret' on the wire. Combined with prompt-injection via
untrusted issue/alert data, this lets an attacker pivot the agent to
arbitrary endpoints on a whitelisted host.
This change:
- Rejects URL paths whose raw or percent-decoded form contains '..'
or '.' segments.
- Normalises the path before glob matching so duplicate slashes and
trailing slashes still match as before.
- Restricts accepted URL schemes to http/https (was previously any
scheme with a netloc).
Added regression tests covering literal '..', percent-encoded '%2e%2e',
single-dot segments, and non-http schemes.
Signed-off-by: Sebastion <sebastion@sebastion.dev>
6bf7acf to
8dc3138
Compare
|
Thanks for the automated checks. Quick status update:
@coderabbitai marked no actionable comments. Ready for human review whenever convenient. |
|
Tip For best results, initiate chat on the files or code changes.
The security fix itself (path-traversal/SSRF hardening in |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/plugins/toolsets/http/test_http_toolset.py (1)
184-189: ⚡ Quick winMake the non-HTTP scheme assertion specific to the expected failure path.
Right now this can pass on any error; asserting the scheme-rejection message makes the test more robust.
Suggested test tightening
endpoint, error = toolset.match_endpoint("file:///etc/passwd") assert endpoint is None assert error is not None + assert "Unsupported URL scheme" in error🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/plugins/toolsets/http/test_http_toolset.py` around lines 184 - 189, The test test_non_http_scheme_rejected currently only checks that an error occurred; tighten it to assert the specific scheme-rejection path by inspecting the returned error from toolset.match_endpoint("file:///etc/passwd") and verifying it contains the expected scheme-related message (e.g. includes "file" or an "unsupported scheme"/"non-http" substring). Update the assertions in test_non_http_scheme_rejected to assert endpoint is None and that error is not None and that str(error) (or error.message) contains the scheme-rejection text produced by match_endpoint to ensure the failure is due to scheme rejection.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/plugins/toolsets/http/test_http_toolset.py`:
- Around line 184-189: The test test_non_http_scheme_rejected currently only
checks that an error occurred; tighten it to assert the specific
scheme-rejection path by inspecting the returned error from
toolset.match_endpoint("file:///etc/passwd") and verifying it contains the
expected scheme-related message (e.g. includes "file" or an "unsupported
scheme"/"non-http" substring). Update the assertions in
test_non_http_scheme_rejected to assert endpoint is None and that error is not
None and that str(error) (or error.message) contains the scheme-rejection text
produced by match_endpoint to ensure the failure is due to scheme rejection.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 11a8ea01-db42-4928-8f53-cbf071e37545
📒 Files selected for processing (2)
holmes/plugins/toolsets/http/http_toolset.pytests/plugins/toolsets/http/test_http_toolset.py
Summary
The HTTP toolset's endpoint whitelist (
HttpToolset.match_endpointinholmes/plugins/toolsets/http/http_toolset.py) matches the URL path withfnmatchagainst the configuredpathspatterns before the URL is normalised. However,urllib3/requestsresolve..and.segments while preparing the request, so a URL that passes the whitelist can hit a completely different path on the wire.This means an attacker who can influence the LLM (e.g. via prompt-injection content in alerts, issue descriptions, or logs that HolmesGPT investigates) can cause the toolset to issue requests against paths on the configured host that the operator never intended to expose — for example reaching
/admin/...on a host that was only whitelisted for/wiki/rest/api/*.holmes/plugins/toolsets/http/http_toolset.py—HttpToolset.match_endpointData flow
confluence.internal, paths["/wiki/rest/api/*"].https://confluence.internal/wiki/rest/api/../../admin/secret.match_endpointrunsfnmatch("/wiki/rest/api/../../admin/secret", "/wiki/rest/api/*")→ True, so the request is approved.requests.Request(...).prepare().urlnormalises the path → the actual request goes tohttps://confluence.internal/admin/secret.We verified both behaviours locally:
Fix
Minimal, defensive changes in
match_endpoint:http/https(rejectsfile://,gopher://, etc.)..or..segments — checked on both the raw path and its percent-decoded form, so%2e%2ecannot bypass the check.posixpath.normpathbefore passing it to_match_path, so duplicate slashes and other benign noise behave consistently.The fix is intentionally narrow: it only tightens the validator and does not change endpoint configuration semantics or the request pipeline. Requests to legitimate paths continue to work unchanged.
Tests
Added four focused tests in
tests/plugins/toolsets/http/test_http_toolset.py:test_path_traversal_dotdot_rejected— literal..segment is rejected.test_path_traversal_encoded_rejected—%2e%2eis rejected.test_path_single_dot_segment_rejected—/./segment is rejected.test_non_http_scheme_rejected—file:///etc/passwdis rejected.Full test file passes:
Security analysis
The exploit preconditions are realistic for HolmesGPT's deployment model:
urlto the HTTP tool. Prompt-injection through investigated content is part of HolmesGPT's documented threat surface.After the fix, an attacker controlling the LLM-supplied URL can no longer escape the configured prefix on a whitelisted host, because any
../.segment (raw or percent-encoded) is rejected up front and the path is normalised before fnmatch.Adversarial review
Before submitting we tried to disprove this finding. We considered whether
requests/urllib3might preserve..on the wire (it does not — it normalises), whether the existing_match_pathalready handled traversal (it uses plainfnmatch, which treats..as ordinary characters), and whether the preconditions already grant equivalent access (they don't — the attacker's prompt-injection capability does not, on its own, let them reach off-prefix paths on whitelisted internal hosts; that ability is exactly what this bug provides). We also checked that the fix does not over-block — paths without traversal segments continue to match as before, and the existing host/path tests still pass.cc @lewiswigmore
Summary by CodeRabbit
Bug Fixes
Tests