Skip to content

omidshz100/SpeechifyPDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SpeechifyPDF

A macOS-native prototype that reads PDF content aloud using system Text‑to‑Speech. Built with Swift and SwiftUI, SpeechifyPDF extracts selectable text from PDFs and plays it back via AVFoundation while highlighting words in real time.

Project overview

SpeechifyPDF is an MVP focused on accessibility and a simple macOS-native user experience. It enables users to open a local PDF, extract the text (for selectable PDFs), and have the text spoken using the system TTS. The app uses native frameworks (PDFKit and AVFoundation) and a lightweight SwiftUI UI.

Features

  • Open local PDF files (NSOpenPanel)
  • Extract text from PDFs using PDFKit (selectable text only)
  • Text‑to‑Speech playback (Speak / Pause / Resume / Stop)
  • Voice / language selection and adjustable speech rate exposed by the TTS engine
  • Real-time word-by-word highlighting while speaking (word-level only)
  • Non-editable preview of extracted text during playback

Notes:

  • OCR for scanned/image-only PDFs is not included in this prototype.
  • No scroll-sync or sentence-level highlighting in this phase.
  • Very large documents are trimmed to a safe maximum to prevent performance issues.

Architecture / Technical notes

High-level components and important files:

  • SpeechEngine.swift

    • Wraps AVSpeechSynthesizer and exposes controls: speak(_:), pause(), resume(), stop().
    • Publishes available voices, selectedVoiceIndex, rate, isSpeaking, and an observable currentWordRange for UI highlighting.
    • Implements AVSpeechSynthesizerDelegate and uses speechSynthesizer(_:willSpeakRangeOfSpeechString:utterance:) to receive character ranges from the synthesizer.
    • Builds word lookup tables with NSString.enumerateSubstrings(..., .byWords) and maps spoken character indices to words via binary search for stable, O(log n) lookups.
    • Ensures UI updates on the main thread.
  • PDFViewModel.swift

    • Loads PDFDocument from URL using PDFKit and extracts text.
    • Uses PDFDocument.string with a per-page fallback, and trims very large text to a safety limit.
  • PDFViewRepresentable.swift

    • Wraps PDFKit's PDFView for SwiftUI display.
  • AttributedTextViewRepresentable.swift

    • New NSViewRepresentable wrapping NSTextView to display attributed text and apply a background highlight to the current word range.
    • Non-editable for a stable one-way display in this MVP.
  • PDFReaderView.swift

    • SwiftUI view that provides the Open / Speak / Stop controls, shows the PDF, and hosts the attributed text view bound to extracted text and the engine's currentWordRange.

Implementation details and rationale:

  • Character ranges received from AVSpeechSynthesizer are UTF-16 (NSString) based. To avoid index mismatches, the app maps using NSString word enumerations so indices align with the synthesizer.
  • Word mapping uses a precomputed list of word NSRanges plus a parallel array of start locations; a small binary-search routine maps a character index to the corresponding word efficiently and robustly for long texts.
  • Highlight updates are published and applied on the main thread to ensure UI stability.
  • The text preview is non-editable in this MVP to avoid two-way synchronization complexities; editing support can be added later with a coordinator approach.

Why this project (MVP & motivation)

The goal is a minimal, focused accessibility tool: make PDF content audible with a native macOS UX and accurate word-level highlighting during speech. This prototype prioritizes:

  • Native frameworks and behavior (PDFKit, AVFoundation, SwiftUI).
  • Robust mapping between synthesizer character ranges and displayed words.
  • A small, maintainable codebase that can be extended later (OCR, scroll-sync, editable text, advanced controls).

How to run

  1. Open the Xcode project/workspace:
    • Open SpeechifyPDF.xcodeproj or the workspace in Xcode (macOS).
  2. Select the macOS app scheme and run (⌘R).
  3. Use the app UI:
    • Click "Open PDF" and select a PDF with selectable text.
    • The PDF will display in the main view and the extracted text populates the preview area.
    • Click "Speak" to start TTS playback. The currently spoken word will be highlighted as speech progresses.
    • Use "Pause", "Resume" and "Stop" controls as needed.
    • Voice/rate selection is provided by the underlying SpeechEngine (available voices and rate). The app exposes these controls where applicable.

Limitations & future work

  • No OCR for image-only PDFs (Vision OCR could be added).
  • Highlighting is word-level only; sentence-level grouping and scroll-sync are future enhancements.
  • The preview text is non-editable in this MVP; two-way editing support is a follow-up task.

This project is a prototype/MVP focused on accessibility and macOS-native UX. Contributions and improvements should preserve robustness for long texts and main-thread UI updates.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages