feat(transcribe): Whisper transcription module with Groq→OpenAI fallback#277
Open
quietbuildlab wants to merge 1 commit intoPanniantong:mainfrom
Open
feat(transcribe): Whisper transcription module with Groq→OpenAI fallback#277quietbuildlab wants to merge 1 commit intoPanniantong:mainfrom
quietbuildlab wants to merge 1 commit intoPanniantong:mainfrom
Conversation
…lback Adds `agent_reach/transcribe.py` — a Python module that downloads audio (yt-dlp), compresses + chunks with ffmpeg, and posts to a Whisper-compatible API. Defaults to Groq's free `whisper-large-v3` and falls back to OpenAI's `whisper-1` on HTTP error. Public surface: - `transcribe(source, *, provider="auto", out_dir=None, config=None) -> str` - `YouTubeChannel.transcribe(url, ...)` delegates to the module - Custom exceptions: `TranscribeError`, `MissingDependency`, `NoProviderConfigured` Integration: - Reuses `agent_reach.config.Config` for API keys (no parallel YAML parser) - Adds `openai_whisper` feature requirement to `Config.FEATURE_REQUIREMENTS` - `YouTubeChannel.check()` surfaces transcription readiness in `doctor` output when keys are configured (and warns if ffmpeg is missing) - `.env.example` documents `OPENAI_API_KEY` alongside the existing `GROQ_API_KEY` Tests (`tests/test_transcribe.py`, 16 cases): - Provider routing for groq and openai endpoints - Auto fallback: groq 429 → openai succeeds - Skip silently when a provider has no key configured - Multi-chunk concatenation - Local file path skips yt-dlp - Clear errors for missing keys and unknown providers Reshapes a previously-proposed standalone shell helper into a Python module that fits the repo's channel/test/doctor architecture per CONTRIBUTING.md.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an optional Whisper audio-transcription capability that fits the repo's existing channel/config/doctor architecture (per
CONTRIBUTING.md).agent_reach/transcribe.py— downloads audio with yt-dlp, compresses + chunks with ffmpeg, and posts to a Whisper-compatible API. Defaults to Groq's freewhisper-large-v3and falls back to OpenAI'swhisper-1on HTTP error.YouTubeChannel.transcribe(url)delegates to the module soyoutubeis the natural caller.Configfor key loading (groq_api_key,openai_api_key) — no parallel YAML parser.openai_whisperinConfig.FEATURE_REQUIREMENTSnext to the existinggroq_whisper.YouTubeChannel.check()reports transcription readiness indoctoroutput, and warns if ffmpeg is missing..env.examplenow documentsOPENAI_API_KEYalongsideGROQ_API_KEY.Why
Groq runs Whisper for free on its LPU and is fast enough that a paid OpenAI key is rarely needed — but Groq's free tier does occasionally rate-limit, so an automatic fallback keeps long pipelines reliable. Today the YouTube channel only checks
yt-dlpavailability; this PR turns YouTube into a full read-and-transcribe surface for agents.API
Or directly:
Tests
tests/test_transcribe.py— 16 cases, all passing locally:NoProviderConfiguredraised before expensive work when no keys presentopenai_whisperregistered inConfig.FEATURE_REQUIREMENTSYouTubeChannel.transcribedelegates with provider/config preservedmypy agent_reach/transcribe.pyandruff check/ruff format --checkon the touched files are clean.Notes for reviewer
yt-dlpandffmpeguse list-arg form (no shell), andPath.glob(nolsparsing) — addressing the shell-script fragility from the earlier PR feat(scripts): add Whisper audio transcription with Groq→OpenAI fallback #276.-c copy, so cuts align to keyframes and Whisper accepts the chunks.requestsandyt-dlpare already inpyproject.toml.ffmpegis a system binary (existing convention viabrew install/apt install).