A macOS-native prototype that reads PDF content aloud using system Text‑to‑Speech. Built with Swift and SwiftUI, SpeechifyPDF extracts selectable text from PDFs and plays it back via AVFoundation while highlighting words in real time.
SpeechifyPDF is an MVP focused on accessibility and a simple macOS-native user experience. It enables users to open a local PDF, extract the text (for selectable PDFs), and have the text spoken using the system TTS. The app uses native frameworks (PDFKit and AVFoundation) and a lightweight SwiftUI UI.
- Open local PDF files (NSOpenPanel)
- Extract text from PDFs using PDFKit (selectable text only)
- Text‑to‑Speech playback (Speak / Pause / Resume / Stop)
- Voice / language selection and adjustable speech rate exposed by the TTS engine
- Real-time word-by-word highlighting while speaking (word-level only)
- Non-editable preview of extracted text during playback
Notes:
- OCR for scanned/image-only PDFs is not included in this prototype.
- No scroll-sync or sentence-level highlighting in this phase.
- Very large documents are trimmed to a safe maximum to prevent performance issues.
High-level components and important files:
-
SpeechEngine.swift- Wraps
AVSpeechSynthesizerand exposes controls:speak(_:),pause(),resume(),stop(). - Publishes available voices, selectedVoiceIndex, rate,
isSpeaking, and an observablecurrentWordRangefor UI highlighting. - Implements
AVSpeechSynthesizerDelegateand usesspeechSynthesizer(_:willSpeakRangeOfSpeechString:utterance:)to receive character ranges from the synthesizer. - Builds word lookup tables with
NSString.enumerateSubstrings(..., .byWords)and maps spoken character indices to words via binary search for stable, O(log n) lookups. - Ensures UI updates on the main thread.
- Wraps
-
PDFViewModel.swift- Loads
PDFDocumentfrom URL using PDFKit and extracts text. - Uses
PDFDocument.stringwith a per-page fallback, and trims very large text to a safety limit.
- Loads
-
PDFViewRepresentable.swift- Wraps
PDFKit'sPDFViewfor SwiftUI display.
- Wraps
-
AttributedTextViewRepresentable.swift- New
NSViewRepresentablewrappingNSTextViewto display attributed text and apply a background highlight to the current word range. - Non-editable for a stable one-way display in this MVP.
- New
-
PDFReaderView.swift- SwiftUI view that provides the Open / Speak / Stop controls, shows the PDF, and hosts the attributed text view bound to extracted text and the engine's
currentWordRange.
- SwiftUI view that provides the Open / Speak / Stop controls, shows the PDF, and hosts the attributed text view bound to extracted text and the engine's
Implementation details and rationale:
- Character ranges received from
AVSpeechSynthesizerare UTF-16 (NSString) based. To avoid index mismatches, the app maps usingNSStringword enumerations so indices align with the synthesizer. - Word mapping uses a precomputed list of word NSRanges plus a parallel array of start locations; a small binary-search routine maps a character index to the corresponding word efficiently and robustly for long texts.
- Highlight updates are published and applied on the main thread to ensure UI stability.
- The text preview is non-editable in this MVP to avoid two-way synchronization complexities; editing support can be added later with a coordinator approach.
The goal is a minimal, focused accessibility tool: make PDF content audible with a native macOS UX and accurate word-level highlighting during speech. This prototype prioritizes:
- Native frameworks and behavior (PDFKit, AVFoundation, SwiftUI).
- Robust mapping between synthesizer character ranges and displayed words.
- A small, maintainable codebase that can be extended later (OCR, scroll-sync, editable text, advanced controls).
- Open the Xcode project/workspace:
- Open
SpeechifyPDF.xcodeprojor the workspace in Xcode (macOS).
- Open
- Select the macOS app scheme and run (⌘R).
- Use the app UI:
- Click "Open PDF" and select a PDF with selectable text.
- The PDF will display in the main view and the extracted text populates the preview area.
- Click "Speak" to start TTS playback. The currently spoken word will be highlighted as speech progresses.
- Use "Pause", "Resume" and "Stop" controls as needed.
- Voice/rate selection is provided by the underlying
SpeechEngine(available voices and rate). The app exposes these controls where applicable.
- No OCR for image-only PDFs (Vision OCR could be added).
- Highlighting is word-level only; sentence-level grouping and scroll-sync are future enhancements.
- The preview text is non-editable in this MVP; two-way editing support is a follow-up task.
This project is a prototype/MVP focused on accessibility and macOS-native UX. Contributions and improvements should preserve robustness for long texts and main-thread UI updates.