Skip to content

[Bug] ChromaDb async upsert/insert blocks the asyncio event loop (sync Rust write on the loop thread) #7712

@basnijholt

Description

@basnijholt

Description

ChromaDb.async_insert and ChromaDb._async_upsert (in libs/agno/agno/vectordb/chroma/chromadb.py) are async by name only. After the embedding step, both end with a direct call to the synchronous self._batch_operation(...) helper, which runs the ChromaDB Rust batch (chromadb.api.rust._upsert / _add) on the running asyncio event loop. While that batch executes, every other coroutine on the same loop is starved.

In long-running async services this means anything else sharing the loop — network long-polls, queue drains, periodic timers, watchdog ticks — appears to stall for the duration of every batch upsert. With multi-MB knowledge bases or lots of files to ingest, "the duration of every batch" can easily run into multiple seconds per file.

Steps to Reproduce

  1. Create a ChromaDb instance and a Knowledge backed by it.
  2. From an async event loop, also start a lightweight heartbeat task that increments a counter every, say, 50 ms (or simply tries to read from a network stream on the same loop).
  3. Trigger await Knowledge.ainsert(path=...) (which routes through ChromaDb.async_insert_batch_operationchromadb.upsert/add).
  4. Observe that the heartbeat counter does not advance / the network stream does not progress while the batch operation is in flight, even though the call is await-ed.

A py-spy dump of the affected process during ingestion shows the asyncio main thread deep inside the synchronous Rust call:

Thread 1 (active): "MainThread"
    _upsert (chromadb/api/rust.py:517)
    upsert (chromadb/Collection.py:503)
    _batch_operation (agno/vectordb/chroma/chromadb.py:255)
    _async_upsert (agno/vectordb/chroma/chromadb.py:659)
    async_upsert (agno/vectordb/chroma/chromadb.py:675)
    _ahandle_vector_db_insert (agno/knowledge/knowledge.py)
    _aload_from_path (agno/knowledge/knowledge.py)
    _aload_content (agno/knowledge/knowledge.py)
    ainsert (agno/knowledge/knowledge.py:223)
    ... user async code ...
    _run (asyncio/events.py)
    _run_once (asyncio/base_events.py)
    run_forever (asyncio/base_events.py)

This is the asyncio main thread, not a worker thread — confirming that the await async_upsert(...) call is blocking the loop for the duration of the synchronous Rust batch.

Expected Behavior

Awaiting ChromaDb.async_insert / ChromaDb.async_upsert should not block the asyncio event loop. Other coroutines on the same loop should continue to make progress while the (long-running, synchronous) ChromaDB batch executes — which is the whole reason for offering an async API in the first place.

This is consistent with the rest of chromadb.py: async_create, async_search, async_drop, async_exists, and async_name_exists already wrap their synchronous bodies in asyncio.to_thread, exactly to avoid this problem. async_insert / _async_upsert are the outliers.

Actual Behavior

While await ChromaDb.async_insert(...) or await ChromaDb.async_upsert(...) is in flight, the asyncio main thread is stuck inside chromadb.api.rust._upsert/_add. No other coroutine runs. Symptoms downstream of this — observed in real services that consume Knowledge.ainsert — include:

  • HTTP long-polls on the same loop appear to "freeze" for seconds at a time, sometimes triggering watchdog/health-check timeouts in the wrapping framework.
  • Inflated time-to-first-token measurements when other coroutines are reading streamed model responses on the same loop.
  • Per-event work queues backing up because their drain coroutines cannot run.
  • Effects scale with batch size and number of files being ingested, so larger knowledge corpora are disproportionately affected.

Logs

py-spy snapshot showing the main thread deep in the sync Rust call: see Description above.

Environment

  • Agno version: 2.5.13 (also reproduces on main as of 5eb41941f)
  • Python: 3.12.12
  • chromadb: >=1.0 (the Rust-backed releases that expose chromadb.api.rust._upsert)
  • OS: Linux (also reproduced on macOS)

Possible Solutions

Wrap the two _batch_operation invocations in asyncio.to_thread, matching the pattern already used by the other async methods in the same file. PR with the fix and regression tests: #7711.

A consumer-side workaround that does not depend on agno landing the fix is to call the synchronous Knowledge.insert API via asyncio.to_thread(...) instead of await Knowledge.ainsert(...), which is what mindroom is doing in the meantime: see mindroom-ai/mindroom#760.

Additional Context

The same async-by-name-only pattern (calling sync DB code directly from async def) appears to exist in several other vector DB integrations under libs/agno/agno/vectordb/. They are out of scope for #7711 but probably worth a similar pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions