Skip to content

omidshz100/voice

Repository files navigation

Real-Time Voice Translation Application for macOS

A powerful desktop application that captures live audio from video calls or system audio, converts speech to text using OpenAI Whisper, and translates it in real-time to your desired language.

🎯 Features

  • Live Audio Capture: Capture from microphone or system audio (video calls, music, etc.)
  • Offline Speech Recognition: Uses OpenAI Whisper for accurate transcription
  • Real-Time Translation: Instant translation using Google Translate
  • Multi-Language Support: 18+ languages including English, Spanish, French, German, Persian, Arabic, Chinese, Japanese, and more
  • User-Friendly Interface: Clean Tkinter-based GUI with real-time text display
  • macOS Optimized: Compatible with both Intel and Apple Silicon (M1/M2) Macs

πŸ“‹ Prerequisites

System Requirements

  • macOS 10.15 (Catalina) or later
  • Python 3.8 or later
  • At least 4GB RAM (8GB recommended for better performance)
  • Internet connection (for translation service)

Required Software

  1. Python 3.8+: Download from python.org
  2. Homebrew: Install from brew.sh
  3. Xcode Command Line Tools: Run xcode-select --install

πŸ› οΈ Installation Guide

Step 1: Clone or Download the Project

# If using git
git clone <repository-url>
cd voice-translator

# Or download and extract the files to a folder

Step 2: Install System Dependencies

# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install PortAudio for audio processing
brew install portaudio

# Install FFmpeg (required by Whisper)
brew install ffmpeg

Step 3: Set Up Python Environment

# Create a virtual environment
python3 -m venv venv

# Activate the virtual environment
source venv/bin/activate

# Upgrade pip
pip install --upgrade pip

Step 4: Install Python Dependencies

# Install all required packages
pip install -r requirements.txt

# If you encounter issues with torch on Apple Silicon:
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

Step 5: Install BlackHole (Optional - for System Audio Capture)

To capture audio from video calls (Zoom, Skype, FaceTime, etc.):

  1. Download BlackHole from existential.audio/blackhole
  2. Install the .pkg file
  3. Configure Audio MIDI Setup:
    • Open "Audio MIDI Setup" (Applications > Utilities)
    • Create a "Multi-Output Device"
    • Select both your speakers and BlackHole
    • Set this as your default output device

πŸš€ Usage

Running the Application

# Make sure you're in the project directory and virtual environment is activated
source venv/bin/activate

# Run the application
python voice_translator.py

Using the Interface

  1. Language Selection:

    • Choose source language (or "Auto Detect")
    • Select target language for translation
  2. Audio Input:

    • Microphone: Captures your voice directly
    • System Audio: Captures all system audio (requires BlackHole setup)
  3. Controls:

    • Start Listening: Begin audio capture and processing
    • Stop Listening: Stop the process
    • Clear Text: Clear both text areas
  4. Real-Time Display:

    • Left panel shows original transcribed text
    • Right panel shows translated text
    • Timestamps are included for each entry

Tips for Best Results

  • Audio Quality: Ensure clear audio input with minimal background noise
  • Speaking Pace: Speak clearly and at a moderate pace
  • Language Detection: Use "Auto Detect" for mixed-language conversations
  • System Audio: Make sure BlackHole is properly configured for video call capture

πŸ”§ Configuration Options

Supported Languages

  • Source Languages: Auto-detect, English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Persian, Turkish, Dutch, Swedish, Norwegian, Danish, Finnish
  • Target Languages: All of the above except Auto-detect

Audio Settings

  • Sample Rate: 16kHz (optimized for speech)
  • Chunk Duration: 3 seconds (adjustable in code)
  • Processing: Real-time with minimal latency

πŸ› Troubleshooting

Common Issues

1. "No module named 'whisper'" Error

pip install openai-whisper

2. Audio Device Not Found

  • Check that your microphone/audio device is connected
  • Grant microphone permissions in System Preferences > Security & Privacy

3. BlackHole Not Working

  • Restart the application after installing BlackHole
  • Check Audio MIDI Setup configuration
  • Ensure BlackHole is set as input device

4. Translation Errors

  • Check internet connection
  • Try different source/target language combinations
  • Restart the application if translation service becomes unresponsive

5. Performance Issues

  • Close other resource-intensive applications
  • Use "base" Whisper model (default) for better performance
  • Consider upgrading RAM if processing is slow

macOS Permissions

The app may request permissions for:

  • Microphone Access: Required for audio capture
  • Accessibility: May be needed for system audio capture

Grant these permissions in System Preferences > Security & Privacy.

πŸ”„ Updates and Improvements

Potential Enhancements

  • Offline Translation: Add support for offline translation models
  • Audio Recording: Save audio clips with translations
  • Custom Models: Support for specialized Whisper models
  • Hotkeys: Global shortcuts for start/stop
  • Themes: Dark mode and custom UI themes

Performance Optimization

  • Use smaller Whisper models ("tiny", "small") for faster processing
  • Adjust chunk duration based on your needs
  • Consider using GPU acceleration if available

πŸ“ Technical Details

Architecture

  • Audio Capture: sounddevice library with real-time streaming
  • Speech Recognition: OpenAI Whisper (local processing)
  • Translation: Google Translate API via googletrans
  • GUI: Tkinter (cross-platform, included with Python)
  • Threading: Separate threads for audio processing and UI updates

File Structure

voice-translator/
β”œβ”€β”€ voice_translator.py    # Main application
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # This file
└── venv/                 # Virtual environment (created during setup)

πŸ†˜ Support

If you encounter issues:

  1. Check the troubleshooting section above
  2. Ensure all dependencies are properly installed
  3. Verify macOS permissions are granted
  4. Try running with different language combinations

πŸ“„ License

This project is for educational and personal use. Please respect the terms of service for the translation services used.


Note: This application requires an internet connection for translation services. Speech recognition (Whisper) works offline once the model is downloaded.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors