feat(transcribe): Whisper transcription module with Groq→OpenAI fallback by quietbuildlab · Pull Request #277 · Panniantong/Agent-Reach

quietbuildlab · 2026-04-25T12:39:51Z

Summary

Adds an optional Whisper audio-transcription capability that fits the repo's existing channel/config/doctor architecture (per CONTRIBUTING.md).

New module agent_reach/transcribe.py — downloads audio with yt-dlp, compresses + chunks with ffmpeg, and posts to a Whisper-compatible API. Defaults to Groq's free whisper-large-v3 and falls back to OpenAI's whisper-1 on HTTP error.
YouTubeChannel.transcribe(url) delegates to the module so youtube is the natural caller.
Reuses Config for key loading (groq_api_key, openai_api_key) — no parallel YAML parser.
Registers openai_whisper in Config.FEATURE_REQUIREMENTS next to the existing groq_whisper.
YouTubeChannel.check() reports transcription readiness in doctor output, and warns if ffmpeg is missing.
.env.example now documents OPENAI_API_KEY alongside GROQ_API_KEY.

Why

Groq runs Whisper for free on its LPU and is fast enough that a paid OpenAI key is rarely needed — but Groq's free tier does occasionally rate-limit, so an automatic fallback keeps long pipelines reliable. Today the YouTube channel only checks yt-dlp availability; this PR turns YouTube into a full read-and-transcribe surface for agents.

API

from agent_reach.config import Config
from agent_reach.channels.youtube import YouTubeChannel

text = YouTubeChannel().transcribe(
    "https://youtu.be/abc",
    provider="auto",          # or "groq" / "openai"
    config=Config(),
)

Or directly:

from agent_reach.transcribe import transcribe
text = transcribe("./recording.m4a", provider="openai", config=Config())

Tests

tests/test_transcribe.py — 16 cases, all passing locally:

Provider routing to groq and openai endpoints (asserts URL, model, auth header)
Fallback: groq HTTP 429 → openai 200 (asserts call order)
Skip-silently when only one provider's key is configured
All providers fail → raise with last error
Multi-chunk concatenation joined by newlines
Local file path skips yt-dlp (asserted via boom-stub)
NoProviderConfigured raised before expensive work when no keys present
openai_whisper registered in Config.FEATURE_REQUIREMENTS
YouTubeChannel.transcribe delegates with provider/config preserved

$ pytest -q
............................................................................. [ 76%]
.........................                                                       [100%]
101 passed

mypy agent_reach/transcribe.py and ruff check/ruff format --check on the touched files are clean.

Notes for reviewer

Subprocess calls to yt-dlp and ffmpeg use list-arg form (no shell), and Path.glob (no ls parsing) — addressing the shell-script fragility from the earlier PR feat(scripts): add Whisper audio transcription with Groq→OpenAI fallback #276.
Chunking re-encodes each segment instead of -c copy, so cuts align to keyframes and Whisper accepts the chunks.
No new runtime dependencies — requests and yt-dlp are already in pyproject.toml. ffmpeg is a system binary (existing convention via brew install/apt install).
Supersedes feat(scripts): add Whisper audio transcription with Groq→OpenAI fallback #276 (which proposed a standalone shell script). I'll close that one once this opens.

…lback Adds `agent_reach/transcribe.py` — a Python module that downloads audio (yt-dlp), compresses + chunks with ffmpeg, and posts to a Whisper-compatible API. Defaults to Groq's free `whisper-large-v3` and falls back to OpenAI's `whisper-1` on HTTP error. Public surface: - `transcribe(source, *, provider="auto", out_dir=None, config=None) -> str` - `YouTubeChannel.transcribe(url, ...)` delegates to the module - Custom exceptions: `TranscribeError`, `MissingDependency`, `NoProviderConfigured` Integration: - Reuses `agent_reach.config.Config` for API keys (no parallel YAML parser) - Adds `openai_whisper` feature requirement to `Config.FEATURE_REQUIREMENTS` - `YouTubeChannel.check()` surfaces transcription readiness in `doctor` output when keys are configured (and warns if ffmpeg is missing) - `.env.example` documents `OPENAI_API_KEY` alongside the existing `GROQ_API_KEY` Tests (`tests/test_transcribe.py`, 16 cases): - Provider routing for groq and openai endpoints - Auto fallback: groq 429 → openai succeeds - Skip silently when a provider has no key configured - Multi-chunk concatenation - Local file path skips yt-dlp - Clear errors for missing keys and unknown providers Reshapes a previously-proposed standalone shell helper into a Python module that fits the repo's channel/test/doctor architecture per CONTRIBUTING.md.

quietbuildlab mentioned this pull request Apr 25, 2026

feat(scripts): add Whisper audio transcription with Groq→OpenAI fallback #276

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transcribe): Whisper transcription module with Groq→OpenAI fallback#277

feat(transcribe): Whisper transcription module with Groq→OpenAI fallback#277
quietbuildlab wants to merge 1 commit intoPanniantong:mainfrom
quietbuildlab:feat/transcribe-python-module

quietbuildlab commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quietbuildlab commented Apr 25, 2026

Summary

Why

API

Tests

Notes for reviewer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant