SpeechifyPDF

A macOS-native prototype that reads PDF content aloud using system Text‑to‑Speech. Built with Swift and SwiftUI, SpeechifyPDF extracts selectable text from PDFs and plays it back via AVFoundation while highlighting words in real time.

Project overview

SpeechifyPDF is an MVP focused on accessibility and a simple macOS-native user experience. It enables users to open a local PDF, extract the text (for selectable PDFs), and have the text spoken using the system TTS. The app uses native frameworks (PDFKit and AVFoundation) and a lightweight SwiftUI UI.

Features

Open local PDF files (NSOpenPanel)
Extract text from PDFs using PDFKit (selectable text only)
Text‑to‑Speech playback (Speak / Pause / Resume / Stop)
Voice / language selection and adjustable speech rate exposed by the TTS engine
Real-time word-by-word highlighting while speaking (word-level only)
Non-editable preview of extracted text during playback

Notes:

OCR for scanned/image-only PDFs is not included in this prototype.
No scroll-sync or sentence-level highlighting in this phase.
Very large documents are trimmed to a safe maximum to prevent performance issues.

Architecture / Technical notes

High-level components and important files:

SpeechEngine.swift
- Wraps AVSpeechSynthesizer and exposes controls: speak(_:), pause(), resume(), stop().
- Publishes available voices, selectedVoiceIndex, rate, isSpeaking, and an observable currentWordRange for UI highlighting.
- Implements AVSpeechSynthesizerDelegate and uses speechSynthesizer(_:willSpeakRangeOfSpeechString:utterance:) to receive character ranges from the synthesizer.
- Builds word lookup tables with NSString.enumerateSubstrings(..., .byWords) and maps spoken character indices to words via binary search for stable, O(log n) lookups.
- Ensures UI updates on the main thread.
PDFViewModel.swift
- Loads PDFDocument from URL using PDFKit and extracts text.
- Uses PDFDocument.string with a per-page fallback, and trims very large text to a safety limit.
PDFViewRepresentable.swift
- Wraps PDFKit's PDFView for SwiftUI display.
AttributedTextViewRepresentable.swift
- New NSViewRepresentable wrapping NSTextView to display attributed text and apply a background highlight to the current word range.
- Non-editable for a stable one-way display in this MVP.
PDFReaderView.swift
- SwiftUI view that provides the Open / Speak / Stop controls, shows the PDF, and hosts the attributed text view bound to extracted text and the engine's currentWordRange.

Implementation details and rationale:

Character ranges received from AVSpeechSynthesizer are UTF-16 (NSString) based. To avoid index mismatches, the app maps using NSString word enumerations so indices align with the synthesizer.
Word mapping uses a precomputed list of word NSRanges plus a parallel array of start locations; a small binary-search routine maps a character index to the corresponding word efficiently and robustly for long texts.
Highlight updates are published and applied on the main thread to ensure UI stability.
The text preview is non-editable in this MVP to avoid two-way synchronization complexities; editing support can be added later with a coordinator approach.

Why this project (MVP & motivation)

The goal is a minimal, focused accessibility tool: make PDF content audible with a native macOS UX and accurate word-level highlighting during speech. This prototype prioritizes:

Native frameworks and behavior (PDFKit, AVFoundation, SwiftUI).
Robust mapping between synthesizer character ranges and displayed words.
A small, maintainable codebase that can be extended later (OCR, scroll-sync, editable text, advanced controls).

How to run

Open the Xcode project/workspace:
- Open SpeechifyPDF.xcodeproj or the workspace in Xcode (macOS).
Select the macOS app scheme and run (⌘R).
Use the app UI:
- Click "Open PDF" and select a PDF with selectable text.
- The PDF will display in the main view and the extracted text populates the preview area.
- Click "Speak" to start TTS playback. The currently spoken word will be highlighted as speech progresses.
- Use "Pause", "Resume" and "Stop" controls as needed.
- Voice/rate selection is provided by the underlying SpeechEngine (available voices and rate). The app exposes these controls where applicable.

Limitations & future work

No OCR for image-only PDFs (Vision OCR could be added).
Highlighting is word-level only; sentence-level grouping and scroll-sync are future enhancements.
The preview text is non-editable in this MVP; two-way editing support is a follow-up task.

This project is a prototype/MVP focused on accessibility and macOS-native UX. Contributions and improvements should preserve robustness for long texts and main-thread UI updates.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
SpeechifyPDF.xcodeproj		SpeechifyPDF.xcodeproj
SpeechifyPDF		SpeechifyPDF
README.md		README.md
xcodebuild.log		xcodebuild.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechifyPDF

Project overview

Features

Architecture / Technical notes

Why this project (MVP & motivation)

How to run

Limitations & future work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeechifyPDF

Project overview

Features

Architecture / Technical notes

Why this project (MVP & motivation)

How to run

Limitations & future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages