Skip to content
@OpenDCAI

OpenDCAI

Define the future of Data-centric AI together

OpenDCAI

Website Google Scholar X Bilibili RedNote Stars Followers

👋 Welcome

✨We are dedicated to advancing research and open-source tools in Data-Centric Artificial Intelligence (DCAI).✨

🚀Our goal is to develop effective and efficient DCAI systems and algorithms that support and enhance the performance of AI models and applications.

🤝 Community

QR_en

Pinned Loading

  1. DataFlow DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    Python 3.1k 226

  2. MyScaleDB MyScaleDB Public

    Forked from OriginHubAI/MyScaleDB

    AI Database for unified, scalable SQL + vector data management, search and analytics

    C++ 39 1

  3. DataFlex DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    Python 114 11

  4. Paper2Any Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    Python 2k 139

  5. AgentFlow AgentFlow Public

    The First Unified Agent Data Synthesis Framework for Custom Agentic Task with all-in-one envrionment

    Python 54 4

Repositories

Showing 10 of 31 repositories
  • Open-NotebookLM Public

    An Open Source implementation of Notebook LM.

    OpenDCAI/Open-NotebookLM’s past year of commit activity
    Python 46 Apache-2.0 7 2 2 Updated Mar 21, 2026
  • Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    OpenDCAI/Paper2Any’s past year of commit activity
    Python 1,975 Apache-2.0 139 7 3 Updated Mar 21, 2026
  • OpenPrism Public

    Open-source implementation of AI-powered academic writing workspace inspired by OpenAI Prism, featuring LaTeX editing, PDF preview, and intelligent AI assistance

    OpenDCAI/OpenPrism’s past year of commit activity
    TypeScript 231 19 2 (1 issue needs help) 3 Updated Mar 21, 2026
  • DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    OpenDCAI/DataFlex’s past year of commit activity
    Python 114 11 0 0 Updated Mar 20, 2026
  • AgentFlow Public

    The First Unified Agent Data Synthesis Framework for Custom Agentic Task with all-in-one envrionment

    OpenDCAI/AgentFlow’s past year of commit activity
    Python 54 4 0 0 Updated Mar 21, 2026
  • Mycel Public

    More is different.

    OpenDCAI/Mycel’s past year of commit activity
    Python 26 MIT 1 0 4 Updated Mar 20, 2026
  • OpenWorldLib Public

    Unified Codebase for Advanced World Models.

    OpenDCAI/OpenWorldLib’s past year of commit activity
    Python 199 Apache-2.0 10 3 2 Updated Mar 20, 2026
  • Text2VectorSQL Public

    Official implementation of Text2VectorSQL: Towards a Unified Interface for Vector Search and SQL Queries

    OpenDCAI/Text2VectorSQL’s past year of commit activity
    Python 53 8 2 0 Updated Mar 20, 2026
  • Flash-MinerU Public

    Ray-based accelerator for MinerU VLM inference pipeline. Lightweight, multi-GPU friendly PDF → Markdown processing. 基于 Ray 的 MinerU VLM 推理加速器,轻量、低侵入,面向多 GPU / 国产算力环境的 PDF → Markdown 处理方案。

    OpenDCAI/Flash-MinerU’s past year of commit activity
    Python 37 AGPL-3.0 4 2 0 Updated Mar 19, 2026
  • One-Eval Public

    Automated system for LLM evaluation via agents.

    OpenDCAI/One-Eval’s past year of commit activity
    Python 29 Apache-2.0 3 1 0 Updated Mar 19, 2026