Skip to content

Fix prow pipeline: in-cluster image build, RAG config, port-forward fix#2

Closed
are-ces wants to merge 17 commits into
radofuchs:LCORE-1496-rhoai-version-bumpfrom
are-ces:LCORE-1496-rhoai-version-bump
Closed

Fix prow pipeline: in-cluster image build, RAG config, port-forward fix#2
are-ces wants to merge 17 commits into
radofuchs:LCORE-1496-rhoai-version-bumpfrom
are-ces:LCORE-1496-rhoai-version-bump

Conversation

@are-ces
Copy link
Copy Markdown

@are-ces are-ces commented Mar 29, 2026

Summary

  • Build llama-stack image in OpenShift internal registry (via oc new-build/oc start-build) instead of building from source in init containers
  • Add FAISS_VECTOR_STORE_ID and KV_RAG_PATH env vars to lightspeed-stack pod manifest
  • Add inference, byok_rag, and rag sections to all prow lightspeed-stack configs (using vllm provider)
  • Fix free_local_tcp_port to only kill LISTEN sockets — was killing the behave test process via lsof -ti:8080 | kill -9 (Error 137)
  • Use scoped envsubst '${LLAMA_STACK_IMAGE}' in pipeline-services.sh to avoid clobbering other ${} patterns
  • Add image-puller role grant for default SA to pull from internal registry
  • Add MCP token secrets and empty OpenAI secret creation to pipeline.sh
  • Simplify llama-stack.yaml manifest to use pre-built image

Test plan

  • Run full prow pipeline end-to-end on OpenShift cluster
  • Verify faiss disruption test completes without killing behave (no more Error 137)
  • Verify llama-stack pod pulls from internal registry successfully
  • Verify lightspeed-stack starts with RAG config (no FAISS_VECTOR_STORE_ID error)

🤖 Generated with Claude Code

radofuchs and others added 17 commits March 29, 2026 19:45
…re#1403)

* Add tekton file

---------

Co-authored-by: Radovan Fuchs <rfuchs@rfuchs-thinkpadp1gen7.tpb.csb>
)

* LCORE-1270: Added e2e tests for responses endpoint

* fix confusing variable names

---------

Co-authored-by: Radovan Fuchs <rfuchs@rfuchs-thinkpadp1gen7.tpb.csb>
* LCORE-1422: Inline RAG (BYOK) e2e tests

- Add 6 e2e scenarios: query, streaming query, responses API, streaming responses API, referenced documents, RAG source registration
- Add inline RAG configs for server, library, and prow modes
- Add reusable "The service uses the {config} configuration" Background step
- Fix BYOK RAG enrichment duplicate detection with env var references
- Add after_feature safety net to restore config backups

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive end-to-end tests verifying that Llama Stack's
NetworkConfig (proxy, TLS) works correctly through the Lightspeed
Stack pipeline.

Test infrastructure:
- TunnelProxy: Async HTTP CONNECT tunnel proxy that creates TCP
  tunnels for HTTPS traffic. Tracks CONNECT count and target hosts.
- InterceptionProxy: Async TLS-intercepting (MITM) proxy using
  trustme CA to generate per-target server certificates. Simulates
  corporate SSL inspection proxies.

Behave scenarios (tests/e2e/features/proxy.feature):
- Tunnel proxy: Configures run.yaml with NetworkConfig proxy pointing
  to a local tunnel proxy. Verifies CONNECT to api.openai.com:443
  is observed and the LLM query succeeds through the proxy.
- Interception proxy: Configures run.yaml with proxy and custom CA
  cert (trustme). Verifies TLS interception of api.openai.com traffic
  and successful LLM query through the MITM proxy.
- TLS version: Configures run.yaml with min_version TLSv1.2 and
  verifies the LLM query succeeds with the TLS constraint.

Each scenario dynamically generates a modified run-ci.yaml with the
appropriate NetworkConfig, restarts Llama Stack with the new config,
restarts the Lightspeed Stack, and sends a query to verify the full
pipeline.

Added trustme>=1.2.1 to dev dependencies.
Expand proxy e2e test coverage to fully address all acceptance criteria:

AC1 (tunnel proxy):
- Add negative test: LLM query fails gracefully when proxy is unreachable

AC2 (interception proxy with CA):
- Add negative test: LLM query fails when interception proxy CA is not
  provided (verifies "only successful when correct CA is provided")

AC3 (TLS version and ciphers):
- Add TLSv1.3 minimum version scenario
- Add custom cipher suite configuration scenario (ECDHE+AESGCM:DHE+AESGCM)

Test infrastructure:
- Add after_scenario cleanup hook in environment.py that restores
  original Llama Stack and Lightspeed Stack configs after @Proxy
  scenarios. Prevents config leaks between scenarios.
- Use different ports for each interception proxy instance to avoid
  address-already-in-use errors in sequential scenarios.

Documentation:
- Update docs/e2e_scenarios.md with all 7 proxy test scenarios.
- Update docs/e2e_testing.md with proxy-related Behave tags
  (@Proxy, @tunnelproxy, @InterceptionProxy, @TLSVersion, @tlscipher).
Changes requested by reviewer (tisnik) and CodeRabbit:

- Detect Docker mode once in before_all and store as
  context.is_docker_mode. All proxy step functions now use the
  context attribute instead of calling _is_docker_mode() repeatedly.
- Log exception in _restore_original_services instead of silently
  swallowing it.
- Only clear context.services_modified on successful restoration,
  not when restoration fails (prevents leaking modified state).
- Add 10-second timeout to tunnel proxy open_connection to prevent
  stalls on unreachable targets.
- Handle malformed CONNECT port with ValueError catch and 400
  response.
Move config restoration from @Proxy after_scenario hook to an
explicit Background Given step. This follows the team convention
that tags are used only for test selection (filtering), not for
triggering behavior.

The Background step "The original Llama Stack config is restored
if modified" runs before every scenario. If a previous scenario
left a modified run.yaml (detected by backup file existence), it
restores the original and restarts services. This handles cleanup
even when the previous scenario failed mid-way.

Removed:
- @Proxy tag from feature file (was triggering after_scenario hook)
- after_scenario hook for @Proxy in environment.py
- _restore_original_services function (replaced by Background step)
- context.services_modified tracking (no hook reads it)

Updated docs/e2e_testing.md: tags documented as selection-only,
not behavior-triggering.
Rewrite proxy e2e tests to follow project conventions:

- Reuse existing step definitions: use "I use query to ask question"
  from llm_query_response.py and "The status code of the response is"
  from common_http.py instead of custom query/response steps.
- Split service restart into two explicit Given steps: "Llama Stack
  is restarted" and "Lightspeed Stack is restarted" so restart
  ordering is visible in the feature file.
- Remove local (non-Docker) mode code path. Proxy tests use
  restart_container() exclusively, consistent with the rest of the
  e2e test suite.
- Check specific status code 500 for error scenarios instead of the
  broad >= 400 range.
- Remove custom send_query, verify_llm_response, and
  verify_error_response steps that duplicated existing functionality.

Net reduction: -183 lines from step definitions.
Stop proxy servers and their event loops explicitly in the Background
restore step. Previously, proxy daemon threads were left running after
each scenario, causing asyncio "Task was destroyed but it is pending"
warnings at process exit.

The _stop_proxy helper schedules an async stop on the proxy's event
loop, waits for it to complete, then stops the loop. Context
references are cleared so the next scenario starts clean.
Add proxy cleanup in after_feature to stop proxy servers left running
from the last scenario. The Background restore step handles cleanup
between scenarios, but the last scenario's proxies persist until
process exit, causing asyncio "Task was destroyed" warnings.

The cleanup checks for proxy objects on context (no tag check needed)
and calls _stop_proxy to gracefully shut down the event loops.
Fixes ruff I001 linter error by reordering imports in `src/app/endpoints/responses.py`.

Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
Signed-off-by: red-hat-konflux-kflux-prd-rh02 <190377777+red-hat-konflux-kflux-prd-rh02[bot]@users.noreply.github.com>
- Add -n $NAMESPACE to all oc commands in pipeline-services.sh, pipeline-test-pod.sh
- Export NAMESPACE from pipeline.sh so child scripts inherit it
- Add MCP token secrets and empty OpenAI secret creation to pipeline.sh
- Replace pre-built llama-stack image with build-from-source manifest (llama-stack.yaml)
- Use envsubst with explicit variable list for lightspeed-stack image substitution
- Fix e2e-ops.sh restart-llama-stack: delete+recreate pod instead of oc apply (immutable spec)
- Separate kv_default and kv_rag db paths in prow run.yaml config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, fix port-forward kill

- Build llama-stack image in OpenShift internal registry instead of from source
- Add image-puller role for default SA to pull from internal registry
- Add FAISS_VECTOR_STORE_ID and KV_RAG_PATH env vars to lightspeed-stack pod
- Add inference, byok_rag, and rag sections to prow lightspeed-stack configs
- Use envsubst with specific variable scoping in pipeline-services.sh
- Fix free_local_tcp_port to only kill LISTEN sockets (was killing behave process)
- Add MCP token secrets and empty OpenAI secret to pipeline.sh
- Add rlsapi_v1_infer action to prow RBAC config
- Simplify llama-stack.yaml to use pre-built image with init container for RAG data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@are-ces are-ces force-pushed the LCORE-1496-rhoai-version-bump branch from 06ce083 to 538071c Compare March 29, 2026 17:45
@are-ces are-ces closed this Mar 29, 2026
radofuchs pushed a commit that referenced this pull request May 1, 2026
Add the spike doc (decisions up front, background below, 4 proposed
JIRAs) and the spec doc (R1..R6 requirements, architecture, key files
and insertion points, known limitations) under docs/design/byok-pdf/.

The spike is lightweight by design: HTML support shipped under
LCORE-1035 (commit 7f688b0, 2026-01-15), so the architectural pattern,
docling dependency, BaseReader plumbing, CLI shape, and test layout
are all already established. PDF support is a scaffold-and-mirror job
plus a one-line addition to document_processor.py's doc_type branches.

Decisions captured for confirmation (each with options table and
recommendation in the spike doc):

  D1: Library                -- docling (already a dependency)
  D2: OCR for scanned PDFs   -- out of scope; track as follow-up
  D3: Repo placement          -- rag-content (impl) + lightspeed-stack
                                  (BYOK guide update only)
  D4: Pipeline knobs          -- hard-coded sensible defaults; no CLI
                                  flags in v1 (mirrors HTMLReader)
  D5: Chunking strategy       -- reuse MarkdownNodeParser; add "pdf" to
                                  document_processor.py:75 and :87
  D6: Code organization       -- new pdf/ package mirroring html/
  D7: Test coverage           -- unit/integration in JIRA #2, e2e in #3

Four sub-JIRAs proposed under LCORE-1471 (parseable by
dev-tools/file-jiras.sh):

  1. rag-content: Implement PDF support
  2. rag-content: Unit and integration tests
  3. rag-content: End-to-end test (PDF -> vector store -> stack query)
  4. lightspeed-stack: Update BYOK guide for native PDF support

PoC evidence under poc-results/:

  01-poc-report.txt    Methodology, findings, implications
  02-conversion-log.txt  Exact commands and timings
  03-sample-jira-1311.md  Clean conversion (Atlassian Cloud PDF)
  04-sample-jira-836.md   Body clean, headings degraded
                          (Confluence PDF, letter-spaced display font)

Honest PoC findings worth surfacing:

- No new dependencies are needed (docling is already in pyproject.toml).
- Body text and tables convert cleanly to Markdown.
- MarkdownNodeParser handles the output -- no parallel chunking pipeline.
- Letter-spaced display fonts (typical of Confluence "Export to PDF")
  produce noisy heading text; documented as a v1 known limitation.
- Cold model load is ~5 minutes on CPU; warm conversions ~30-90 s for
  small/medium PDFs. Acceptable for offline indexing.

Per howto-run-a-spike.md step 10, poc/ and poc-results/ will be
removed before merge; spike doc and spec doc remain in the repo.
radofuchs pushed a commit that referenced this pull request May 1, 2026
Apply CodeRabbit's actionable comments and the per-comment nits:

1. PoC results section in spike doc previously listed paths under
   poc-results/ that are deleted before merge per howto-run-a-spike.md
   step 10, leaving broken links in the merged document. Replace the
   file list with a self-contained summary of what the PoC proved
   plus the heading-degradation finding, and a note pointing future
   readers at the PR diff if the raw artifacts are ever needed.

2. Drop the reference to docs/local-stack-testing.md (a local-only
   file, never committed to the repo).

3. Replace fragile line-numbered references (document_processor.py:75,
   :87, byok_guide.md ~106-118) with stable symbol anchors:
   _BaseDB.__init__, _LlamaStackDB.__init__, "Knowledge Sources"
   subsection, "Step 1" subsection. Line numbers rot; section names
   and symbol names rot less.

4. Spec doc now instructs the implementation ticket to extract the
   ("markdown", "html", "pdf") predicate to a single
   MARKDOWN_COMPATIBLE_DOC_TYPES: Final[tuple[str, ...]] constant in
   document_processor.py and reference it from both call sites,
   instead of duplicating the tuple. JIRA #1 scope updated to match.

5. Add R7: PDFReader.load_data emits a logger.warning when its docling
   output is empty / under a small threshold (a likely indicator of a
   scanned PDF given R5's no-OCR scope). Threshold is a module-level
   Final[int] constant. JIRA #1 scope and JIRA #2 test patterns
   updated to require coverage via caplog. Surfacing the silent-
   degradation case in custom_processor.py logs costs nothing and
   makes the OCR-needed signal visible.

Plus the two reviewer nits worth carrying into JIRA #1:

- Use docling's TableFormerMode.ACCURATE enum, not the string literal
  "accurate"; both work via Pydantic coercion but the enum is
  type-checked.
- Mirror HTMLReader's choice on whether to call super().__init__();
  llama-index's BaseReader does not require it but symmetry between
  the two readers is preferred.

The spec doc changelog records this revision and its trigger (the
PR lightspeed-core#1598 CodeRabbit review).
radofuchs pushed a commit that referenced this pull request May 13, 2026
radofuchs pushed a commit that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants