Fix prow pipeline: in-cluster image build, RAG config, port-forward fix#2
Closed
are-ces wants to merge 17 commits into
Closed
Conversation
…re#1403) * Add tekton file --------- Co-authored-by: Radovan Fuchs <rfuchs@rfuchs-thinkpadp1gen7.tpb.csb>
* LCORE-1422: Inline RAG (BYOK) e2e tests
- Add 6 e2e scenarios: query, streaming query, responses API, streaming responses API, referenced documents, RAG source registration
- Add inline RAG configs for server, library, and prow modes
- Add reusable "The service uses the {config} configuration" Background step
- Fix BYOK RAG enrichment duplicate detection with env var references
- Add after_feature safety net to restore config backups
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive end-to-end tests verifying that Llama Stack's NetworkConfig (proxy, TLS) works correctly through the Lightspeed Stack pipeline. Test infrastructure: - TunnelProxy: Async HTTP CONNECT tunnel proxy that creates TCP tunnels for HTTPS traffic. Tracks CONNECT count and target hosts. - InterceptionProxy: Async TLS-intercepting (MITM) proxy using trustme CA to generate per-target server certificates. Simulates corporate SSL inspection proxies. Behave scenarios (tests/e2e/features/proxy.feature): - Tunnel proxy: Configures run.yaml with NetworkConfig proxy pointing to a local tunnel proxy. Verifies CONNECT to api.openai.com:443 is observed and the LLM query succeeds through the proxy. - Interception proxy: Configures run.yaml with proxy and custom CA cert (trustme). Verifies TLS interception of api.openai.com traffic and successful LLM query through the MITM proxy. - TLS version: Configures run.yaml with min_version TLSv1.2 and verifies the LLM query succeeds with the TLS constraint. Each scenario dynamically generates a modified run-ci.yaml with the appropriate NetworkConfig, restarts Llama Stack with the new config, restarts the Lightspeed Stack, and sends a query to verify the full pipeline. Added trustme>=1.2.1 to dev dependencies.
Expand proxy e2e test coverage to fully address all acceptance criteria: AC1 (tunnel proxy): - Add negative test: LLM query fails gracefully when proxy is unreachable AC2 (interception proxy with CA): - Add negative test: LLM query fails when interception proxy CA is not provided (verifies "only successful when correct CA is provided") AC3 (TLS version and ciphers): - Add TLSv1.3 minimum version scenario - Add custom cipher suite configuration scenario (ECDHE+AESGCM:DHE+AESGCM) Test infrastructure: - Add after_scenario cleanup hook in environment.py that restores original Llama Stack and Lightspeed Stack configs after @Proxy scenarios. Prevents config leaks between scenarios. - Use different ports for each interception proxy instance to avoid address-already-in-use errors in sequential scenarios. Documentation: - Update docs/e2e_scenarios.md with all 7 proxy test scenarios. - Update docs/e2e_testing.md with proxy-related Behave tags (@Proxy, @tunnelproxy, @InterceptionProxy, @TLSVersion, @tlscipher).
Changes requested by reviewer (tisnik) and CodeRabbit: - Detect Docker mode once in before_all and store as context.is_docker_mode. All proxy step functions now use the context attribute instead of calling _is_docker_mode() repeatedly. - Log exception in _restore_original_services instead of silently swallowing it. - Only clear context.services_modified on successful restoration, not when restoration fails (prevents leaking modified state). - Add 10-second timeout to tunnel proxy open_connection to prevent stalls on unreachable targets. - Handle malformed CONNECT port with ValueError catch and 400 response.
Move config restoration from @Proxy after_scenario hook to an explicit Background Given step. This follows the team convention that tags are used only for test selection (filtering), not for triggering behavior. The Background step "The original Llama Stack config is restored if modified" runs before every scenario. If a previous scenario left a modified run.yaml (detected by backup file existence), it restores the original and restarts services. This handles cleanup even when the previous scenario failed mid-way. Removed: - @Proxy tag from feature file (was triggering after_scenario hook) - after_scenario hook for @Proxy in environment.py - _restore_original_services function (replaced by Background step) - context.services_modified tracking (no hook reads it) Updated docs/e2e_testing.md: tags documented as selection-only, not behavior-triggering.
Rewrite proxy e2e tests to follow project conventions: - Reuse existing step definitions: use "I use query to ask question" from llm_query_response.py and "The status code of the response is" from common_http.py instead of custom query/response steps. - Split service restart into two explicit Given steps: "Llama Stack is restarted" and "Lightspeed Stack is restarted" so restart ordering is visible in the feature file. - Remove local (non-Docker) mode code path. Proxy tests use restart_container() exclusively, consistent with the rest of the e2e test suite. - Check specific status code 500 for error scenarios instead of the broad >= 400 range. - Remove custom send_query, verify_llm_response, and verify_error_response steps that duplicated existing functionality. Net reduction: -183 lines from step definitions.
Stop proxy servers and their event loops explicitly in the Background restore step. Previously, proxy daemon threads were left running after each scenario, causing asyncio "Task was destroyed but it is pending" warnings at process exit. The _stop_proxy helper schedules an async stop on the proxy's event loop, waits for it to complete, then stops the loop. Context references are cleared so the next scenario starts clean.
Add proxy cleanup in after_feature to stop proxy servers left running from the last scenario. The Background restore step handles cleanup between scenarios, but the last scenario's proxies persist until process exit, causing asyncio "Task was destroyed" warnings. The cleanup checks for proxy objects on context (no tag check needed) and calls _stop_proxy to gracefully shut down the event loops.
Fixes ruff I001 linter error by reordering imports in `src/app/endpoints/responses.py`. Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
Signed-off-by: red-hat-konflux-kflux-prd-rh02 <190377777+red-hat-konflux-kflux-prd-rh02[bot]@users.noreply.github.com>
- Add -n $NAMESPACE to all oc commands in pipeline-services.sh, pipeline-test-pod.sh - Export NAMESPACE from pipeline.sh so child scripts inherit it - Add MCP token secrets and empty OpenAI secret creation to pipeline.sh - Replace pre-built llama-stack image with build-from-source manifest (llama-stack.yaml) - Use envsubst with explicit variable list for lightspeed-stack image substitution - Fix e2e-ops.sh restart-llama-stack: delete+recreate pod instead of oc apply (immutable spec) - Separate kv_default and kv_rag db paths in prow run.yaml config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, fix port-forward kill - Build llama-stack image in OpenShift internal registry instead of from source - Add image-puller role for default SA to pull from internal registry - Add FAISS_VECTOR_STORE_ID and KV_RAG_PATH env vars to lightspeed-stack pod - Add inference, byok_rag, and rag sections to prow lightspeed-stack configs - Use envsubst with specific variable scoping in pipeline-services.sh - Fix free_local_tcp_port to only kill LISTEN sockets (was killing behave process) - Add MCP token secrets and empty OpenAI secret to pipeline.sh - Add rlsapi_v1_infer action to prow RBAC config - Simplify llama-stack.yaml to use pre-built image with init container for RAG data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
06ce083 to
538071c
Compare
radofuchs
pushed a commit
that referenced
this pull request
May 1, 2026
Add the spike doc (decisions up front, background below, 4 proposed
JIRAs) and the spec doc (R1..R6 requirements, architecture, key files
and insertion points, known limitations) under docs/design/byok-pdf/.
The spike is lightweight by design: HTML support shipped under
LCORE-1035 (commit 7f688b0, 2026-01-15), so the architectural pattern,
docling dependency, BaseReader plumbing, CLI shape, and test layout
are all already established. PDF support is a scaffold-and-mirror job
plus a one-line addition to document_processor.py's doc_type branches.
Decisions captured for confirmation (each with options table and
recommendation in the spike doc):
D1: Library -- docling (already a dependency)
D2: OCR for scanned PDFs -- out of scope; track as follow-up
D3: Repo placement -- rag-content (impl) + lightspeed-stack
(BYOK guide update only)
D4: Pipeline knobs -- hard-coded sensible defaults; no CLI
flags in v1 (mirrors HTMLReader)
D5: Chunking strategy -- reuse MarkdownNodeParser; add "pdf" to
document_processor.py:75 and :87
D6: Code organization -- new pdf/ package mirroring html/
D7: Test coverage -- unit/integration in JIRA #2, e2e in #3
Four sub-JIRAs proposed under LCORE-1471 (parseable by
dev-tools/file-jiras.sh):
1. rag-content: Implement PDF support
2. rag-content: Unit and integration tests
3. rag-content: End-to-end test (PDF -> vector store -> stack query)
4. lightspeed-stack: Update BYOK guide for native PDF support
PoC evidence under poc-results/:
01-poc-report.txt Methodology, findings, implications
02-conversion-log.txt Exact commands and timings
03-sample-jira-1311.md Clean conversion (Atlassian Cloud PDF)
04-sample-jira-836.md Body clean, headings degraded
(Confluence PDF, letter-spaced display font)
Honest PoC findings worth surfacing:
- No new dependencies are needed (docling is already in pyproject.toml).
- Body text and tables convert cleanly to Markdown.
- MarkdownNodeParser handles the output -- no parallel chunking pipeline.
- Letter-spaced display fonts (typical of Confluence "Export to PDF")
produce noisy heading text; documented as a v1 known limitation.
- Cold model load is ~5 minutes on CPU; warm conversions ~30-90 s for
small/medium PDFs. Acceptable for offline indexing.
Per howto-run-a-spike.md step 10, poc/ and poc-results/ will be
removed before merge; spike doc and spec doc remain in the repo.
radofuchs
pushed a commit
that referenced
this pull request
May 1, 2026
Apply CodeRabbit's actionable comments and the per-comment nits:
1. PoC results section in spike doc previously listed paths under
poc-results/ that are deleted before merge per howto-run-a-spike.md
step 10, leaving broken links in the merged document. Replace the
file list with a self-contained summary of what the PoC proved
plus the heading-degradation finding, and a note pointing future
readers at the PR diff if the raw artifacts are ever needed.
2. Drop the reference to docs/local-stack-testing.md (a local-only
file, never committed to the repo).
3. Replace fragile line-numbered references (document_processor.py:75,
:87, byok_guide.md ~106-118) with stable symbol anchors:
_BaseDB.__init__, _LlamaStackDB.__init__, "Knowledge Sources"
subsection, "Step 1" subsection. Line numbers rot; section names
and symbol names rot less.
4. Spec doc now instructs the implementation ticket to extract the
("markdown", "html", "pdf") predicate to a single
MARKDOWN_COMPATIBLE_DOC_TYPES: Final[tuple[str, ...]] constant in
document_processor.py and reference it from both call sites,
instead of duplicating the tuple. JIRA #1 scope updated to match.
5. Add R7: PDFReader.load_data emits a logger.warning when its docling
output is empty / under a small threshold (a likely indicator of a
scanned PDF given R5's no-OCR scope). Threshold is a module-level
Final[int] constant. JIRA #1 scope and JIRA #2 test patterns
updated to require coverage via caplog. Surfacing the silent-
degradation case in custom_processor.py logs costs nothing and
makes the OCR-needed signal visible.
Plus the two reviewer nits worth carrying into JIRA #1:
- Use docling's TableFormerMode.ACCURATE enum, not the string literal
"accurate"; both work via Pydantic coercion but the enum is
type-checked.
- Mirror HTMLReader's choice on whether to call super().__init__();
llama-index's BaseReader does not require it but symmetry between
the two readers is preferred.
The spec doc changelog records this revision and its trigger (the
PR lightspeed-core#1598 CodeRabbit review).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
oc new-build/oc start-build) instead of building from source in init containersFAISS_VECTOR_STORE_IDandKV_RAG_PATHenv vars to lightspeed-stack pod manifestinference,byok_rag, andragsections to all prow lightspeed-stack configs (using vllm provider)free_local_tcp_portto only kill LISTEN sockets — was killing the behave test process vialsof -ti:8080 | kill -9(Error 137)envsubst '${LLAMA_STACK_IMAGE}'in pipeline-services.sh to avoid clobbering other${}patternsTest plan
🤖 Generated with Claude Code