A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Mar 20, 2026 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Unified web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Low-code framework for building custom LLMs, neural networks, and other AI models
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
Add a description, image, and links to the llama topic page so that developers can more easily learn about it.
To associate your repository with the llama topic, visit your repo's landing page and select "manage topics."