🖼️ Image Text Summarizer (OCR + NLP)

This project extracts text from an image using Tesseract OCR and summarizes the extracted text using Hugging Face Transformers.

It combines:

Optical Character Recognition (OCR)
Natural Language Processing (NLP)
Transformer-based summarization models

⚙ Installation Guide

1️⃣ Install Python

Make sure Python 3.8 or higher is installed.

Check version:

python --version

If not installed, download from: https://www.python.org/downloads/

2️⃣ Clone the Repository

git clone https://github.com/your-username/image-text-summarizer.git
cd image-text-summarizer

3️⃣ Install Required Dependencies

pip install pytesseract pillow transformers torch

(Optional) Create a virtual environment:

python -m venv venv
venv\Scripts\activate     # Windows
source venv/bin/activate  # Linux/macOS

4️⃣ Install Tesseract OCR (Important)

This project requires Tesseract OCR installed separately.

🔹 Windows

Download from: https://github.com/tesseract-ocr/tesseract
Install it.
If needed, set path in your script:

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

🔹 Linux (Ubuntu/Debian)

sudo apt update
sudo apt install tesseract-ocr

🔹 macOS

brew install tesseract

▶ How to Use

Step 1: Add Your Image

Place your image file inside the project folder.

Example:

project-folder/
│
├── image_summary.py
├── sample.png

Step 2: Update Image Path

Open image_summary.py and update:

image = Image.open("sample.png")

Make sure the filename matches your image.

Step 3: Run the Script

python image_summary.py

✅ Output

The script will:

Extract text from the image.
Print extracted text.
Generate summarized text.
Print the final summary.

🧠 How It Works

Image is loaded using Pillow.
Text is extracted using Tesseract OCR.
Extracted text is passed to Hugging Face summarization pipeline.
Transformer model generates a concise summary.

🚨 Common Errors & Fixes

❌ TesseractNotFoundError

Make sure:

Tesseract is installed
Path is correctly set (Windows users)

❌ Model Download Takes Time

The first run may take longer because the transformer model downloads automatically.

📦 requirements.txt (Optional)

You can create a requirements.txt file with:

pytesseract
pillow
transformers
torch

Install using:

pip install -r requirements.txt

👨‍💻 Author

Your Name
Computer Science Student
Interested in AI, NLP, and Automation

⭐ If you like this project, give it a star on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
image_summary.py		image_summary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🖼️ Image Text Summarizer (OCR + NLP)

⚙ Installation Guide

1️⃣ Install Python

2️⃣ Clone the Repository

3️⃣ Install Required Dependencies

4️⃣ Install Tesseract OCR (Important)

🔹 Windows

🔹 Linux (Ubuntu/Debian)

🔹 macOS

▶ How to Use

Step 1: Add Your Image

Step 2: Update Image Path

Step 3: Run the Script

✅ Output

🧠 How It Works

🚨 Common Errors & Fixes

❌ TesseractNotFoundError

❌ Model Download Takes Time

📦 requirements.txt (Optional)

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🖼️ Image Text Summarizer (OCR + NLP)

⚙ Installation Guide

1️⃣ Install Python

2️⃣ Clone the Repository

3️⃣ Install Required Dependencies

4️⃣ Install Tesseract OCR (Important)

🔹 Windows

🔹 Linux (Ubuntu/Debian)

🔹 macOS

▶ How to Use

Step 1: Add Your Image

Step 2: Update Image Path

Step 3: Run the Script

✅ Output

🧠 How It Works

🚨 Common Errors & Fixes

❌ TesseractNotFoundError

❌ Model Download Takes Time

📦 requirements.txt (Optional)

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages