Reference architecture and CI for Llama Stack on Red Hat OpenShift AI (RHOAI).
Work in Progress - This repository is actively evolving toward a production-ready reference architecture for Llama Stack on RHOAI. While core functionality is operational (deployment, authentication, RAG demos), we're continuously expanding components, refining kustomize overlays, and adding demo scripts to showcase Llama Stack capabilities in action.
- Reference Architecture: Production-ready deployment of Llama Stack using RHOAI components (VLLM, PostgreSQL, Milvus, Keycloak)
- Automated Testing: CI that validates deployments with example client scripts
- Integration Testing: Test RHOAI/ODH/upstream Llama Stack images through GitHub Actions
- Demo Scripts: Reusable examples (RAG, authentication) for downstream projects
Note: Documentation is intentionally kept minimal during early development to avoid rapid obsolescence. Use LLMs to explore the codebase and understand usage patterns.
┌─────────────────────────────────────────────────────┐
│ Llama Stack Distribution (CRD) │
│ ├─ Inference: VLLM (llama-3-2-3b) │
│ ├─ Embeddings: VLLM (nomic-embed-text-v1.5) │
│ ├─ Auth: Keycloak OAuth2 (RBAC + Team-based) │
│ ├─ Vector Store: Milvus (50Gi) │
│ └─ Storage: PostgreSQL (20Gi) │
└─────────────────────────────────────────────────────┘
CI/CD: GitHub Actions workflow tests full deployment lifecycle on ROSA with configurable image overrides for testing ODH/upstream builds.
- OpenShift CLI (
oc) - Container tool (
podmanordocker) - Python 3.12+
- uv
Create environment file (see config.sh.example for details):
cp config.sh.example ~/.lls_showroom
# Edit ~/.lls_showroom and set required values./setup.sh # Install RHOAI operator and dependencies
./provision.sh # Deploy Llama Stack distributionAfter provisioning, URLs and credentials are automatically saved to ~/.lls_showroom_generated:
# Run demos by tags (see demos/manifest.yaml for available tags)
./test.sh # Run all demos
./test.sh simple # Run simple demos only
./test.sh complex # Run complex demos (requires OpenAI API key)
./test.sh rag,api # Run demos tagged with 'rag' OR 'api'
# Available tags: simple, complex, rag, api, agents, storage, embeddings, openai-requiredOr run individual demos directly:
uv run demos/rag/demo.py # RAG with S3 file storage and vector search
uv run demos/responses/demo.py # Multi-turn conversations with response tracking
uv run demos/responses/demo.py --prompt "What is RAG?" # Single-turn with custom question
./demos/tests/restarttest/restarttest.sh # Test response persistence across server restarts (requires `oc` cluster access)
uv run demos/multi_agent/demo.py # Multi-agent research assistantWith explicit parameters:
uv run demos/rag/demo.py <LLAMASTACK_URL> <KEYCLOAK_URL> <USERNAME> <PASSWORD>
uv run demos/responses/demo.py <LLAMASTACK_URL> <KEYCLOAK_URL> <USERNAME> <PASSWORD>Note: The multi-agent demo requires SHOWROOM_OPENAI_API_KEY to be set in ~/.lls_showroom.
Test local LlamaStack code changes on the cluster for rapid iteration.
# 1. Clone llama-stack locally
git clone https://github.com/meta-llama/llama-stack ~/llama-stack
# 2. Configure
echo "export LLAMA_STACK_SOURCE_PATH=~/llama-stack" >> ~/.lls_showroom
# 3. Deploy your changes
./deploy-local.sh
# → Builds image, pushes to in-cluster registry, restarts pod, shows logs
# 4. Test your changes
curl https://$(oc get route llamastack-distribution -o jsonpath='{.spec.host}')/v1/health
# 5. Revert to official image when done
./provision.shFeatures: Uses in-cluster registry (no external accounts needed), auto-detects base image and handles authentication.
Add to ~/.lls_showroom:
| Variable | Default | Description |
|---|---|---|
LLAMA_STACK_SOURCE_PATH |
(required) | Path to local llama-stack repository |
DEV_IMAGE_NAMESPACE |
redhat-ods-applications |
Namespace for images |
DEV_IMAGE_NAME |
llama-stack-dev |
Image name |
DEV_IMAGE_TAG |
dev-YYYYMMDD-HHMMSS |
Image tag (auto-generated) |
DEV_BASE_IMAGE |
(auto-detected) | Base image to use |
CONTAINER_TOOL |
podman |
Container tool (podman/docker) |
Registry authentication fails:
REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}')
podman login -u $(oc whoami) -p $(oc whoami -t) --tls-verify=false $REGISTRYRegistry route not available (requires cluster-admin):
oc patch configs.imageregistry.operator.openshift.io/cluster \
--type=merge -p '{"spec":{"defaultRoute":true}}'Pod not using dev image:
# Check Kyverno policy exists
oc get clusterpolicy replace-rhoai-llama-stack-images
# Check pod image
oc get pod -l app=llama-stack -n redhat-ods-applications \
-o jsonpath='{.items[0].spec.containers[0].image}'./unprovision.sh # Remove Llama Stack distribution
./cleanup.sh # Remove RHOAI operator and dependenciesCI workflow (.github/workflows/provision.yml) runs on PRs and supports image overrides:
catalog_image: Custom RHOAI catalog sourcellama_stack_image: Custom Llama Stack distro imagellama_stack_operator_image: Custom operator image
This enables testing ODH/upstream builds before they're released.
Contributions welcome in:
- Additional demo scripts (reuse from llama-stack-demos)
- Kustomize overlays to work towards a single refarch
- CI/CD improvements and test coverage