Skip to content

fix(http-toolset): reject path-traversal segments in endpoint whitelist (CWE-918/CWE-22)#2007

Open
sebastiondev wants to merge 1 commit intoHolmesGPT:masterfrom
sebastiondev:fix/cwe918-confluence-ssrf-e06a
Open

fix(http-toolset): reject path-traversal segments in endpoint whitelist (CWE-918/CWE-22)#2007
sebastiondev wants to merge 1 commit intoHolmesGPT:masterfrom
sebastiondev:fix/cwe918-confluence-ssrf-e06a

Conversation

@sebastiondev
Copy link
Copy Markdown

@sebastiondev sebastiondev commented May 6, 2026

Summary

The HTTP toolset's endpoint whitelist (HttpToolset.match_endpoint in holmes/plugins/toolsets/http/http_toolset.py) matches the URL path with fnmatch against the configured paths patterns before the URL is normalised. However, urllib3/requests resolve .. and . segments while preparing the request, so a URL that passes the whitelist can hit a completely different path on the wire.

This means an attacker who can influence the LLM (e.g. via prompt-injection content in alerts, issue descriptions, or logs that HolmesGPT investigates) can cause the toolset to issue requests against paths on the configured host that the operator never intended to expose — for example reaching /admin/... on a host that was only whitelisted for /wiki/rest/api/*.

  • CWE: CWE-918 (SSRF) / CWE-22 (Path Traversal in validator)
  • File: holmes/plugins/toolsets/http/http_toolset.pyHttpToolset.match_endpoint
  • Severity: Medium — requires the HTTP toolset to be configured and a prompt-injection vector, but bypasses an explicit security control (the path whitelist).

Data flow

  1. Operator configures an HTTP toolset endpoint, e.g. host confluence.internal, paths ["/wiki/rest/api/*"].
  2. LLM (influenced by attacker-controlled investigated content) calls the HTTP tool with https://confluence.internal/wiki/rest/api/../../admin/secret.
  3. match_endpoint runs fnmatch("/wiki/rest/api/../../admin/secret", "/wiki/rest/api/*")True, so the request is approved.
  4. requests.Request(...).prepare().url normalises the path → the actual request goes to https://confluence.internal/admin/secret.
  5. The whitelist is bypassed and an unintended path is fetched (and its response returned to the LLM/user).

We verified both behaviours locally:

>>> import fnmatch
>>> fnmatch.fnmatch("/wiki/rest/api/../../admin/secret", "/wiki/rest/api/*")
True
>>> import requests
>>> requests.Request("GET", "http://example.com/api/foo/../../admin").prepare().url
'http://example.com/admin'

Fix

Minimal, defensive changes in match_endpoint:

  1. Restrict accepted schemes to http / https (rejects file://, gopher://, etc.).
  2. Reject any path containing . or .. segments — checked on both the raw path and its percent-decoded form, so %2e%2e cannot bypass the check.
  3. Normalise the path with posixpath.normpath before passing it to _match_path, so duplicate slashes and other benign noise behave consistently.

The fix is intentionally narrow: it only tightens the validator and does not change endpoint configuration semantics or the request pipeline. Requests to legitimate paths continue to work unchanged.

Tests

Added four focused tests in tests/plugins/toolsets/http/test_http_toolset.py:

  • test_path_traversal_dotdot_rejected — literal .. segment is rejected.
  • test_path_traversal_encoded_rejected%2e%2e is rejected.
  • test_path_single_dot_segment_rejected/./ segment is rejected.
  • test_non_http_scheme_rejectedfile:///etc/passwd is rejected.

Full test file passes:

tests/plugins/toolsets/http/test_http_toolset.py ... 86 passed

Security analysis

The exploit preconditions are realistic for HolmesGPT's deployment model:

  • The HTTP toolset is configured to talk to an internal service whose host also serves more sensitive paths than the whitelisted prefix (Confluence/Jira/Grafana style apps with admin endpoints under the same host are common).
  • The investigated data (alert annotations, issue text, log lines) can influence what the LLM passes as url to the HTTP tool. Prompt-injection through investigated content is part of HolmesGPT's documented threat surface.

After the fix, an attacker controlling the LLM-supplied URL can no longer escape the configured prefix on a whitelisted host, because any ../. segment (raw or percent-encoded) is rejected up front and the path is normalised before fnmatch.

Adversarial review

Before submitting we tried to disprove this finding. We considered whether requests/urllib3 might preserve .. on the wire (it does not — it normalises), whether the existing _match_path already handled traversal (it uses plain fnmatch, which treats .. as ordinary characters), and whether the preconditions already grant equivalent access (they don't — the attacker's prompt-injection capability does not, on its own, let them reach off-prefix paths on whitelisted internal hosts; that ability is exactly what this bug provides). We also checked that the fix does not over-block — paths without traversal segments continue to match as before, and the existing host/path tests still pass.

cc @lewiswigmore

Summary by CodeRabbit

  • Bug Fixes

    • Strengthened URL validation and security for HTTP endpoints: restricts to HTTP/HTTPS, ensures host presence, prevents path traversal (including encoded variants), and normalizes paths while preserving trailing slashes for accurate matching.
  • Tests

    • Added tests covering path traversal cases, encoded traversal, single-dot segments, and rejection of non-HTTP schemes to validate endpoint matching.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 6, 2026

CLA Not Signed

@netlify
Copy link
Copy Markdown

netlify Bot commented May 6, 2026

Deploy Preview for holmes-docs failed. Why did it fail? →

Name Link
🔨 Latest commit 8dc3138
🔍 Latest deploy log https://app.netlify.com/projects/holmes-docs/deploys/69fb66eadb155d0007df55ca

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Walkthrough

Stricter URL validation added to HttpToolset.match_endpoint: only HTTP/HTTPS with netloc allowed; path traversal rejected for '.' and '..' segments (raw and percent-decoded); paths normalized with trailing-slash preservation. New tests cover traversal and non-HTTP scheme rejection.

Changes

Enhanced URL Validation and Path Traversal Protection

Layer / File(s) Summary
Imports
holmes/plugins/toolsets/http/http_toolset.py
Added posixpath and unquote to support path normalization and percent-decoding.
URL Scheme & Netloc Validation
holmes/plugins/toolsets/http/http_toolset.py
match_endpoint enforces http/https schemes and requires a non-empty netloc.
Path Traversal Checks
holmes/plugins/toolsets/http/http_toolset.py
Rejects paths containing . or .. segments, checking both raw and percent-decoded path components and returning a rejection message when detected.
Path Normalization
holmes/plugins/toolsets/http/http_toolset.py
Normalizes request paths with posixpath.normpath, preserves trailing slashes when appropriate, and uses the normalized path for endpoint matching.
Tests
tests/plugins/toolsets/http/test_http_toolset.py
Added tests for dot-dot traversal, encoded traversal, single-dot traversal, and non-HTTP scheme rejection in host/endpoint matching.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main security fix: rejecting path-traversal segments in the HTTP endpoint whitelist with direct reference to the relevant CWE vulnerabilities (CWE-918/CWE-22).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…2/CWE-918)

The HTTP toolset whitelist matched the raw URL path with fnmatch, but
requests/urllib3 normalise '..' segments before issuing the request.
An LLM-supplied URL such as

    https://confluence.example.com/wiki/rest/api/../../admin/secret

would match the configured '/wiki/rest/api/*' allow-list yet actually
hit '/admin/secret' on the wire. Combined with prompt-injection via
untrusted issue/alert data, this lets an attacker pivot the agent to
arbitrary endpoints on a whitelisted host.

This change:

- Rejects URL paths whose raw or percent-decoded form contains '..'
  or '.' segments.
- Normalises the path before glob matching so duplicate slashes and
  trailing slashes still match as before.
- Restricts accepted URL schemes to http/https (was previously any
  scheme with a netloc).

Added regression tests covering literal '..', percent-encoded '%2e%2e',
single-dot segments, and non-http schemes.

Signed-off-by: Sebastion <sebastion@sebastion.dev>
@sebastiondev sebastiondev force-pushed the fix/cwe918-confluence-ssrf-e06a branch from 6bf7acf to 8dc3138 Compare May 6, 2026 16:06
@sebastiondev
Copy link
Copy Markdown
Author

Thanks for the automated checks. Quick status update:

  • DCO: Force-pushed an amended commit with Signed-off-by trailer to fix the DCO check.
  • EasyCLA: I'll complete the EasyCLA flow for @sebastiondev via the linked sign-up link so the CLA bot can re-evaluate. Will follow up here once that's done.
  • Netlify docs preview: The failing checks (holmes-docs deploy preview / Header / Redirect / Pages rules) are for the docs site build. This PR only touches holmes/plugins/toolsets/http/http_toolset.py and its test; no markdown/docs files. The deploy preview failure does not appear to be caused by this change. Happy to look closer if a maintainer can point me at the deploy log error — the Netlify log is gated behind the team's account.
  • License Compliance: 18 issues are also reported, but these aren't visibly tied to the diff in this PR (no dependency changes). Same offer — if a maintainer can share the report, I'll address anything attributable to this change.

@coderabbitai marked no actionable comments. Ready for human review whenever convenient.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Tip

For best results, initiate chat on the files or code changes.

@sebastiondev Thanks for the detailed status update — that all sounds reasonable:

  • DCO: The amended commit with the Signed-off-by trailer should satisfy the check once the bot re-evaluates.
  • EasyCLA: Completing the CLA flow for your account should unblock that check.
  • Netlify docs preview: Your assessment looks correct — no documentation or markdown files were modified in this PR, so the deploy preview failure is almost certainly a pre-existing or unrelated issue. Maintainers can verify by checking the Netlify build log against the main branch baseline.
  • License Compliance: With no dependency changes in the diff, it's very likely those 18 issues are pre-existing and not introduced by this PR. A maintainer can confirm by running the compliance scan against main to compare.

The security fix itself (path-traversal/SSRF hardening in match_endpoint) looks well-scoped and the accompanying tests cover the key attack vectors (raw .., percent-encoded %2e%2e, single ., and non-HTTP schemes). The PR is in good shape for a maintainer pass. 🐰

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/plugins/toolsets/http/test_http_toolset.py (1)

184-189: ⚡ Quick win

Make the non-HTTP scheme assertion specific to the expected failure path.

Right now this can pass on any error; asserting the scheme-rejection message makes the test more robust.

Suggested test tightening
     endpoint, error = toolset.match_endpoint("file:///etc/passwd")
     assert endpoint is None
     assert error is not None
+    assert "Unsupported URL scheme" in error
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/plugins/toolsets/http/test_http_toolset.py` around lines 184 - 189, The
test test_non_http_scheme_rejected currently only checks that an error occurred;
tighten it to assert the specific scheme-rejection path by inspecting the
returned error from toolset.match_endpoint("file:///etc/passwd") and verifying
it contains the expected scheme-related message (e.g. includes "file" or an
"unsupported scheme"/"non-http" substring). Update the assertions in
test_non_http_scheme_rejected to assert endpoint is None and that error is not
None and that str(error) (or error.message) contains the scheme-rejection text
produced by match_endpoint to ensure the failure is due to scheme rejection.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/plugins/toolsets/http/test_http_toolset.py`:
- Around line 184-189: The test test_non_http_scheme_rejected currently only
checks that an error occurred; tighten it to assert the specific
scheme-rejection path by inspecting the returned error from
toolset.match_endpoint("file:///etc/passwd") and verifying it contains the
expected scheme-related message (e.g. includes "file" or an "unsupported
scheme"/"non-http" substring). Update the assertions in
test_non_http_scheme_rejected to assert endpoint is None and that error is not
None and that str(error) (or error.message) contains the scheme-rejection text
produced by match_endpoint to ensure the failure is due to scheme rejection.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 11a8ea01-db42-4928-8f53-cbf071e37545

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf7acf and 8dc3138.

📒 Files selected for processing (2)
  • holmes/plugins/toolsets/http/http_toolset.py
  • tests/plugins/toolsets/http/test_http_toolset.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant