Skip to content

[Bug] Team.acontinue_run fails to find member agent run after process restart — member session not persisted in coordinate mode #7717

@denizmatsu

Description

@denizmatsu

Description

When using TeamMode.coordinate with a member agent that has requires_confirmation_tools (HITL), the team pauses correctly and the paused state is persisted to agno_sessions. However, calling team.acontinue_run() after a process restart fails because the member agent's run is never saved to its own session in the database.

The root cause is in agno/agent/_session.pyasave_session() skips DB persistence when agent.team_id is set (which is always true for team members). The member's run is only stored in the team's session via session.upsert_run(), but acontinue_run tries to load it from the member's own (non-existent) session.

Steps to Reproduce

from agno.agent import Agent
from agno.team import Team
from agno.team.mode import TeamMode
from agno.tools.telegram import TelegramTools
from agno.db.postgres import AsyncPostgresDb

# Shared DB for team + member
db = AsyncPostgresDb(
    db_url="postgresql+psycopg_async://...",
    db_schema="ai",
    session_table="agno_sessions",
    approvals_table="agno_approvals",
)

# Member agent with HITL
telegram_tools = TelegramTools(
    token="...", chat_id="...",
    enable_send_message=True,
    requires_confirmation_tools=["send_message"],
)

member = Agent(
    id="telegram-agent",
    name="Telegram Agent",
    model=model,
    tools=[telegram_tools],
    db=db,  # Same DB as team
)

team = Team(
    id="my-team",
    name="My Team",
    model=model,
    members=[member],
    mode=TeamMode.coordinate,
    db=db,
    session_id="test-session",
)

# Step 1: Run — member pauses at send_message (requires_confirmation)
run_id = None
async for ev in team.arun("send a telegram message", stream=True, stream_events=True):
    if isinstance(ev, TeamRunPaused):
        run_id = ev.run_id
        requirements = ev.requirements
        # requirements[0].member_run_id exists
        # requirements[0]._member_run_response exists (in-memory)
        break

# Step 2: Simulate process restart — recreate team from scratch
# (In real scenario: backend restarts, all in-memory objects lost)
del team, member

member2 = Agent(id="telegram-agent", name="Telegram Agent", model=model, tools=[telegram_tools], db=db)
team2 = Team(id="my-team", name="My Team", model=model, members=[member2], mode=TeamMode.coordinate, db=db, session_id="test-session")

# Step 3: Load paused run from DB
session = await db.get_session("test-session")
paused_run = next(r for r in session.runs if r.run_id == run_id)
reqs = paused_run.requirements
for req in reqs:
    req.confirm()

# Step 4: Continue — THIS FAILS
async for ev in team2.acontinue_run(
    run_id=run_id,
    session_id="test-session",
    requirements=reqs,
    stream=True,
    stream_events=True,
):
    pass  # ERROR: "No runs found for run ID {member_run_id}"

Agent Configuration (if applicable)

# Member agent
Agent(
    id="telegram-agent",
    name="Telegram Agent",
    model=AzureOpenAI(id="gpt-4o"),
    tools=[TelegramTools(requires_confirmation_tools=["send_message"])],
    db=shared_async_postgres_db,  # Same instance as team
    add_history_to_context=False,
)

# Team
Team(
    id="my-team",
    mode=TeamMode.coordinate,
    members=[member_agent],
    db=shared_async_postgres_db,
    session_id="test-session",
    stream_member_events=True,
)

Expected Behavior

team.acontinue_run() should successfully continue the member agent's paused run after process restart, since:

  1. The team session (with the paused run) IS persisted in DB
  2. The requirements (with member_run_id) are loaded from DB
  3. Team and member share the same AsyncPostgresDb instance

Actual Behavior

ERROR    Error in Agent run: No runs found for run ID ce7731e1-c3a2-4281-b6f5-3f0586c33726
WARNING  Member telegram-agent streaming did not yield a final RunOutput

The team's leader still produces a final response (non-fatal), but the member agent's tool (send_message) is never executed.

Screenshots or Logs (if applicable)

# Pause (works correctly)
[Chat Team] TeamRunPaused: pause_type=confirmation run_id=06ecfe6a tools=1
[Chat Team] HITL member info: agent=telegram-agent member_run_id=ce7731e1

# Resume (fails at member level)
[HITL Resume] Paused run: run_id=06ecfe6a status=PAUSED reqs=1
[HITL Resume] Requirement: member_agent=telegram-agent member_run_id=ce7731e1
[HITL Resume] acontinue_run: run_id=06ecfe6a
ERROR    Error in Agent run: No runs found for run ID ce7731e1
WARNING  Member telegram-agent streaming did not yield a final RunOutput
[HITL Resume] Complete: run_id=06ecfe6a deltas=9 len=19

DB verification — member run exists ONLY in team session:

-- Member run found in team session
SELECT session_id, session_type FROM ai.agno_sessions 
WHERE runs::text LIKE '%ce7731e1%';
-- Result: session=chat-team:xxx, type=team

-- No separate member session exists
SELECT * FROM ai.agno_sessions 
WHERE session_id LIKE '%telegram%' OR agent_id = 'telegram-agent';
-- Result: 0 rows

Environment

- **Agno version**: 2.6.3
- **Python**: 3.11
- **Database**: AsyncPostgresDb (PostgreSQL)
- **Team mode**: TeamMode.coordinate
- **OS**: macOS (Darwin 25.3.0)

Possible Solutions (optional)

Option A: Save member session to DB (recommended)

In agno/agent/_session.py, allow member agents to save their session when they have db set, even if team_id is not None:

async def asave_session(agent, session):
    if agent.db is not None and session.session_data is not None:
        # Remove the team_id/workflow_id check — members should persist too
        await agent.db.upsert_session(session)

Option B: Use team session for member lookup in acontinue_run

In agno/team/_run.py _route_requirements_to_members(), when _member_run_response is None, search the team's session for the member run instead of calling member.continue_run(run_id=...):

if member_run_output is None and member_run_id:
    # Search team session for member run (it's stored there via upsert_run)
    for r in (run_response.member_responses or []) + (session.runs or []):
        if getattr(r, "run_id", None) == member_run_id:
            member_run_output = r
            break

Option C: Serialize _member_run_response to DB

Store _member_run_response in the team session's JSONB so it survives process restart. Currently it's only an in-memory reference that gets lost during deserialization.

Additional Context

Code path trace:

  1. delegate_task_to_member (team/_default_tools.py:568) — calls member.run(session_id=team.session_id)
  2. Member run completes/pauses — asave_session SKIPS save because agent.team_id is not None (agent/_session.py:13)
  3. Member run stored in team session only — session.upsert_run(member_run_response) (team/_default_tools.py:529)
  4. _propagate_member_pause stores _member_run_response as in-memory cache (team/_tools.py:581-583)
  5. After DB deserialization, _member_run_response is None
  6. _route_requirements_to_members fallback: member.continue_run(run_id=member_run_id) (team/_run.py:4863)
  7. Member agent tries to load session from DB — session doesn't exist → RuntimeError

Workaround: The team leader still produces a response (the error is non-fatal at the team level). But the member agent's tool is never re-executed after resume, which means the confirmed action (e.g., sending a Telegram message) doesn't actually happen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions