Skip to content

LoYuXr/CalliReader

Repository files navigation

CalliReaderImage: Contextualizing Chinese Calligraphy via an Embedding-aligned Vision Language Model

This is the repository of CalliReader: Contextualizing Chinese Calligraphy via an Embedding-aligned Vision Language Model.

CalliReader is a novel plug-and-play Vision-Language Model (VLM) specifically designed to interpret calligraphic artworks with diverse styles and layouts, leveraging slicing priors, embedding alignment, and effective fine-tuning. It demonstrates remarkable performances on Chinese Calligraphy recognition and understanding, while also retains excellent OCR ability on general scenes.

For more information, please visit our project page (Unfinished).

teaser

📬 News

  • 2025.10.6 We are releasing our CalliBench.
  • 2025.8.10 We are releasing our training scripts.
  • 2025.6.26 Our newest model is now available on HuggingFace.
  • 2025.6.25 Our work has been accepted by ICCV 2025!
  • 2025.2.12 The repository has been updated.

How to Use Our Code and Model:

We are releasing our network and checkpoints. You can download weights of our CalliReader from this HuggingFace link. Finetuned VLM weight files that end with .safetensors are stored in the folder InternVL, and all pluggable modules can be found in the folder params. You can download those files and put them in the same folder of the cloned repository.

You can setup the pipeline under the following guidance.

0. Install dependencies

  1. We recommend creating a conda environment with Python>=3.9 and activate it:
conda create -n callireader python=3.9
conda activate callireader
  1. Then, install essential dependencies:
pip install requirements.txt
  1. Finally, install the package flash-attn:
pip install flash-attn

If you encounter certain problems with this package, you can download .whl file here for direct installation:

pip install flash_attn-xxx.whl

Please note that this package only supports Linux systems with CUDA installed, and all of their versions should be matched. For further issues about flash-attn, please turn to its repository for help.

1. Inference

We have verified that .jpg and .png format images are well supported.

  1. For a single image, use
python inference.py --tgt=<image path> 

The result will be output directly in the terminal.

  1. For a folder with multiple images, use
python inference.py --tgt=<folder path>  --save_name=<your save name>

Results will be saved to ./results/<your save name>.json.

2. Dataset

CalliBench

Data of Full-page Recognition, Region-wise OCR, Choice Questions (Author, Style, and Layout), Bilingual Interpretation, and Intent Analysis can be downloaded in this link. It contains 3,192 image-annotation samples in total, and we use them to construct our CalliBench.

Training data

Data of 7,357 samples for e-IT can be downloaded in this link.

3. Training

Please refer to the train folder for further instructions.

4. Evaluation

Run evaluate.py to assess the model on our CalliBench. You should first download the dataset and then run

python evaluate.py --type=<Eval type> --data=<CalliBench path> --save_name=<Test name>

to evaluate the model on various Calligraphy-related tasks. For example, run

python evaluate.py --type=full_page --data=./Callibench --save_name=exp

to test the model on full-page recognition task.

For bilingual interpretation and intent analysis tasks, please refer to train folder for example codes.

Citation

@InProceedings{Luo_2025_ICCV,
    author    = {Luo, Yuxuan and Tang, Jiaqi and Huang, Chenyi and Hao, Feiyang and Lian, Zhouhui},
    title     = {CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {23030-23040}
}

About

Official repository for CalliReader: Contextualizing Chinese Calligraphy via an Embedding-aligned Vision Language Model [ICCV 2025]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages