SmallRAG - 轻量级本地文档问答助手

无需GPU、无需网络，完全本地运行的AI文档助手

核心特性

特性	说明
离线运行	100% 本地，无需网络
快速启动	< 1秒 (规则模式)
低内存	~500MB (规则) / ~1.5GB (ONNX模式)
易部署	纯Python，依赖少

系统架构

用户问题
    │
    ▼
┌─────────────────┐
│   检索模块      │ ← BM25 高召回检索
│  (纯Python)    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  质量评估       │
└────────┬────────┘
         │
    ┌────┴────┐
    │ 高相关?  │
    └────┬────┘
     Yes │  No
         │  │
         ▼  ▼
┌─────────────┐  ┌─────────────┐
│  规则抽取   │  │  Qwen3.5-2B│
│  (无LLM)   │  │  ONNX Q4    │
└─────────────┘  └─────────────┘

快速开始

1. 安装依赖

pip install -r requirements.txt
pip install onnxruntime transformers

2. 下载模型

使用 ModelScope 下载 Qwen3.5-2B-ONNX 模型：

pip install modelscope

# 下载完整模型（包含 embedding 和 decoder）
python -m modelscope.cli.cli download onnx-community/Qwen3.5-2B-ONNX \
    onnx/embed_tokens_q4f16.onnx \
    onnx/embed_tokens_q4f16.onnx_data \
    onnx/decoder_model_merged_q4.onnx \
    onnx/decoder_model_merged_q4.onnx_data \
    --local_dir ./models/Qwen3.5-2B-ONNX

所需文件：

文件	大小	说明
`embed_tokens_q4f16.onnx`	1KB	Embedding 层
`embed_tokens_q4f16.onnx_data`	294MB	Embedding 权重
`decoder_model_merged_q4.onnx`	1MB	Decoder 骨架
`decoder_model_merged_q4.onnx_data`	1.2GB	Decoder 权重

3. 运行

# 规则模式 (无需模型)
python main.py --add ./data --query "你的问题"

# LLM 模式 (更精准)
python main.py --add ./data --use-llm --query "你的问题"

# 交互模式
python main.py --interactive

使用示例

from core import SimpleRAG, RAGConfig

# 基础用法
rag = SimpleRAG()
rag.ingest("./data")           # 摄入文档
result = rag.query("问题")      # 问答

# 启用 LLM 模式
config = RAGConfig(use_llm=True)
rag = SimpleRAG(config)

目录结构

SmallRAG/
├── core/                      # 核心模块
│   ├── config.py             # 配置
│   ├── document_parser.py    # 文档解析
│   ├── chunker.py            # 文本分块
│   ├── indexer.py            # Whoosh索引
│   ├── retriever.py          # BM25检索
│   ├── answer_generator.py   # 规则问答
│   ├── llm_generator.py      # LLM问答 (ONNX)
│   ├── generator.py          # 统一生成器
│   └── small_rag.py          # 主入口
├── tests/                    # 测试
├── data/                     # 文档和索引
├── models/                   # LLM模型 (需单独下载，见下方说明)
├── main.py                   # CLI入口
└── requirements.txt         # 依赖

模型说明

使用 Qwen3.5-2B-ONNX 模型，纯 CPU 推理：

指标	值
参数量	2B
格式	ONNX (Q4量化)
内存占用	~1.5GB
推理设备	CPU

依赖

包	用途	必须
jieba	中文分词	✅
Whoosh	全文索引	✅
PyPDF2	PDF解析	✅
python-docx	Word解析	✅
openpyxl	Excel解析	✅
beautifulsoup4	HTML解析	✅
python-pptx	PPT解析	✅
onnxruntime	ONNX推理	✅
transformers	Tokenizer	✅

CLI 命令

python main.py --check                 # 检查环境
python main.py --add <文件>            # 添加文档
python main.py --add-dir <目录>         # 添加目录
python main.py --query <问题>          # 查询
python main.py --interactive           # 交互模式
python main.py --use-llm               # 启用LLM

交互模式命令

add <文件>        - 添加文档
add-dir <目录>   - 添加目录
query <问题>     - 查询 (简写: q)
smart <问题>     - 智能查询 (自动建库)
status           - 查看状态
clear            - 清空索引
exit             - 退出

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
core		core
docs/plans		docs/plans
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.yaml		config.yaml
main.py		main.py
requirements.txt		requirements.txt
test_llm.py		test_llm.py
启动.bat		启动.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmallRAG - 轻量级本地文档问答助手

核心特性

系统架构

快速开始

1. 安装依赖

2. 下载模型

3. 运行

使用示例

目录结构

模型说明

依赖

CLI 命令

交互模式命令

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmallRAG - 轻量级本地文档问答助手

核心特性

系统架构

快速开始

1. 安装依赖

2. 下载模型

3. 运行

使用示例

目录结构

模型说明

依赖

CLI 命令

交互模式命令

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages