A powerful desktop application that captures live audio from video calls or system audio, converts speech to text using OpenAI Whisper, and translates it in real-time to your desired language.
- Live Audio Capture: Capture from microphone or system audio (video calls, music, etc.)
- Offline Speech Recognition: Uses OpenAI Whisper for accurate transcription
- Real-Time Translation: Instant translation using Google Translate
- Multi-Language Support: 18+ languages including English, Spanish, French, German, Persian, Arabic, Chinese, Japanese, and more
- User-Friendly Interface: Clean Tkinter-based GUI with real-time text display
- macOS Optimized: Compatible with both Intel and Apple Silicon (M1/M2) Macs
- macOS 10.15 (Catalina) or later
- Python 3.8 or later
- At least 4GB RAM (8GB recommended for better performance)
- Internet connection (for translation service)
- Python 3.8+: Download from python.org
- Homebrew: Install from brew.sh
- Xcode Command Line Tools: Run
xcode-select --install
# If using git
git clone <repository-url>
cd voice-translator
# Or download and extract the files to a folder# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install PortAudio for audio processing
brew install portaudio
# Install FFmpeg (required by Whisper)
brew install ffmpeg# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
source venv/bin/activate
# Upgrade pip
pip install --upgrade pip# Install all required packages
pip install -r requirements.txt
# If you encounter issues with torch on Apple Silicon:
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpuTo capture audio from video calls (Zoom, Skype, FaceTime, etc.):
- Download BlackHole from existential.audio/blackhole
- Install the .pkg file
- Configure Audio MIDI Setup:
- Open "Audio MIDI Setup" (Applications > Utilities)
- Create a "Multi-Output Device"
- Select both your speakers and BlackHole
- Set this as your default output device
# Make sure you're in the project directory and virtual environment is activated
source venv/bin/activate
# Run the application
python voice_translator.py-
Language Selection:
- Choose source language (or "Auto Detect")
- Select target language for translation
-
Audio Input:
- Microphone: Captures your voice directly
- System Audio: Captures all system audio (requires BlackHole setup)
-
Controls:
- Start Listening: Begin audio capture and processing
- Stop Listening: Stop the process
- Clear Text: Clear both text areas
-
Real-Time Display:
- Left panel shows original transcribed text
- Right panel shows translated text
- Timestamps are included for each entry
- Audio Quality: Ensure clear audio input with minimal background noise
- Speaking Pace: Speak clearly and at a moderate pace
- Language Detection: Use "Auto Detect" for mixed-language conversations
- System Audio: Make sure BlackHole is properly configured for video call capture
- Source Languages: Auto-detect, English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Persian, Turkish, Dutch, Swedish, Norwegian, Danish, Finnish
- Target Languages: All of the above except Auto-detect
- Sample Rate: 16kHz (optimized for speech)
- Chunk Duration: 3 seconds (adjustable in code)
- Processing: Real-time with minimal latency
1. "No module named 'whisper'" Error
pip install openai-whisper2. Audio Device Not Found
- Check that your microphone/audio device is connected
- Grant microphone permissions in System Preferences > Security & Privacy
3. BlackHole Not Working
- Restart the application after installing BlackHole
- Check Audio MIDI Setup configuration
- Ensure BlackHole is set as input device
4. Translation Errors
- Check internet connection
- Try different source/target language combinations
- Restart the application if translation service becomes unresponsive
5. Performance Issues
- Close other resource-intensive applications
- Use "base" Whisper model (default) for better performance
- Consider upgrading RAM if processing is slow
The app may request permissions for:
- Microphone Access: Required for audio capture
- Accessibility: May be needed for system audio capture
Grant these permissions in System Preferences > Security & Privacy.
- Offline Translation: Add support for offline translation models
- Audio Recording: Save audio clips with translations
- Custom Models: Support for specialized Whisper models
- Hotkeys: Global shortcuts for start/stop
- Themes: Dark mode and custom UI themes
- Use smaller Whisper models ("tiny", "small") for faster processing
- Adjust chunk duration based on your needs
- Consider using GPU acceleration if available
- Audio Capture: sounddevice library with real-time streaming
- Speech Recognition: OpenAI Whisper (local processing)
- Translation: Google Translate API via googletrans
- GUI: Tkinter (cross-platform, included with Python)
- Threading: Separate threads for audio processing and UI updates
voice-translator/
βββ voice_translator.py # Main application
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ venv/ # Virtual environment (created during setup)
If you encounter issues:
- Check the troubleshooting section above
- Ensure all dependencies are properly installed
- Verify macOS permissions are granted
- Try running with different language combinations
This project is for educational and personal use. Please respect the terms of service for the translation services used.
Note: This application requires an internet connection for translation services. Speech recognition (Whisper) works offline once the model is downloaded.