Skip to content

feat(transcribe): Whisper transcription module with Groq→OpenAI fallback#277

Open
quietbuildlab wants to merge 1 commit intoPanniantong:mainfrom
quietbuildlab:feat/transcribe-python-module
Open

feat(transcribe): Whisper transcription module with Groq→OpenAI fallback#277
quietbuildlab wants to merge 1 commit intoPanniantong:mainfrom
quietbuildlab:feat/transcribe-python-module

Conversation

@quietbuildlab
Copy link
Copy Markdown

Summary

Adds an optional Whisper audio-transcription capability that fits the repo's existing channel/config/doctor architecture (per CONTRIBUTING.md).

  • New module agent_reach/transcribe.py — downloads audio with yt-dlp, compresses + chunks with ffmpeg, and posts to a Whisper-compatible API. Defaults to Groq's free whisper-large-v3 and falls back to OpenAI's whisper-1 on HTTP error.
  • YouTubeChannel.transcribe(url) delegates to the module so youtube is the natural caller.
  • Reuses Config for key loading (groq_api_key, openai_api_key) — no parallel YAML parser.
  • Registers openai_whisper in Config.FEATURE_REQUIREMENTS next to the existing groq_whisper.
  • YouTubeChannel.check() reports transcription readiness in doctor output, and warns if ffmpeg is missing.
  • .env.example now documents OPENAI_API_KEY alongside GROQ_API_KEY.

Why

Groq runs Whisper for free on its LPU and is fast enough that a paid OpenAI key is rarely needed — but Groq's free tier does occasionally rate-limit, so an automatic fallback keeps long pipelines reliable. Today the YouTube channel only checks yt-dlp availability; this PR turns YouTube into a full read-and-transcribe surface for agents.

API

from agent_reach.config import Config
from agent_reach.channels.youtube import YouTubeChannel

text = YouTubeChannel().transcribe(
    "https://youtu.be/abc",
    provider="auto",          # or "groq" / "openai"
    config=Config(),
)

Or directly:

from agent_reach.transcribe import transcribe
text = transcribe("./recording.m4a", provider="openai", config=Config())

Tests

tests/test_transcribe.py — 16 cases, all passing locally:

  • Provider routing to groq and openai endpoints (asserts URL, model, auth header)
  • Fallback: groq HTTP 429 → openai 200 (asserts call order)
  • Skip-silently when only one provider's key is configured
  • All providers fail → raise with last error
  • Multi-chunk concatenation joined by newlines
  • Local file path skips yt-dlp (asserted via boom-stub)
  • NoProviderConfigured raised before expensive work when no keys present
  • openai_whisper registered in Config.FEATURE_REQUIREMENTS
  • YouTubeChannel.transcribe delegates with provider/config preserved
$ pytest -q
............................................................................. [ 76%]
.........................                                                       [100%]
101 passed

mypy agent_reach/transcribe.py and ruff check/ruff format --check on the touched files are clean.

Notes for reviewer

…lback

Adds `agent_reach/transcribe.py` — a Python module that downloads audio
(yt-dlp), compresses + chunks with ffmpeg, and posts to a Whisper-compatible
API. Defaults to Groq's free `whisper-large-v3` and falls back to OpenAI's
`whisper-1` on HTTP error.

Public surface:
- `transcribe(source, *, provider="auto", out_dir=None, config=None) -> str`
- `YouTubeChannel.transcribe(url, ...)` delegates to the module
- Custom exceptions: `TranscribeError`, `MissingDependency`, `NoProviderConfigured`

Integration:
- Reuses `agent_reach.config.Config` for API keys (no parallel YAML parser)
- Adds `openai_whisper` feature requirement to `Config.FEATURE_REQUIREMENTS`
- `YouTubeChannel.check()` surfaces transcription readiness in `doctor` output
  when keys are configured (and warns if ffmpeg is missing)
- `.env.example` documents `OPENAI_API_KEY` alongside the existing `GROQ_API_KEY`

Tests (`tests/test_transcribe.py`, 16 cases):
- Provider routing for groq and openai endpoints
- Auto fallback: groq 429 → openai succeeds
- Skip silently when a provider has no key configured
- Multi-chunk concatenation
- Local file path skips yt-dlp
- Clear errors for missing keys and unknown providers

Reshapes a previously-proposed standalone shell helper into a Python module
that fits the repo's channel/test/doctor architecture per CONTRIBUTING.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant