Skip to content

agno-2.5.9 Integration Findings and Upstream Improvement Opportunities #7716

@TnoobT

Description

@TnoobT

Description

agno-2.5.9 Integration Findings and Upstream Improvement Opportunities

Context

  • We are currently using agno 2.5.9 in our project.
  • The purpose of this document is to summarize the issues and upstream improvement opportunities we found while integrating and extending agno in a real production project.
  • The items below are not theoretical concerns. They were identified from actual usage, debugging, and local fixes in our codebase.
  • If useful for review, we can also provide the patched source code or reduced repro cases for the relevant items.

Findings

0. Team does not inject LearningMachine context into the system prompt

Files involved:

  • libs/agno/agno/team/_messages.py
  • Reference file: libs/agno/agno/agent/_messages.py

Observed issue:

  • The Team class exposes the configuration add_learnings_to_context: bool = True.
  • In the original agent flow, when add_learnings_to_context is enabled, system prompt construction calls agent._learning.build_context(...) / abuild_context(...).
  • The original team system prompt construction path does not contain equivalent logic.
  • As a result, with the same user_profile, user_memory, and session_context enabled, an agent can read those learnings, but the team leader cannot.
  • In practice, Team.add_learnings_to_context behaves like a declared feature that is not actually wired through.

Detailed explanation:

  • At the behavior level, this is not a case of the team "learning less well". The team leader simply never receives the learning context during system prompt construction.
  • Semantically, agent and team do not honor the same configuration in the same way, which makes the API behavior misleading.
  • This is difficult for users to diagnose because the configuration exists, the flag is enabled, but nothing from learning is present in the final prompt.
  • This fits well as an upstream bug report because it is not a product philosophy difference. It is a capability mismatch between two closely related APIs.

Potential upstream direction:

  • The key point is that Team.add_learnings_to_context=True does not currently produce behavior equivalent to Agent.add_learnings_to_context=True.
  • A minimal repro would show that the same memory/profile can be recalled by an agent but not by the team leader.

Why this matters:

  • It is a clear agent/team capability mismatch.
  • The scope is well defined and not business-specific.
  • The fix is localized.

1. Async tool execution uses the wrong entrypoint when no hooks are configured

Files involved:

  • libs/agno/agno/tools/function.py

Observed issue:

  • In FunctionCall, when tool_hooks are not configured, the async execution path falls back to the sync entrypoint execute_entrypoint.
  • This looks like a framework bug rather than a project-specific customization.

Detailed explanation:

  • From the caller's perspective, the code is entering the async path, but the actual execution entrypoint is still the sync one.
  • If a tool depends on async resource handling, await chains, or event-loop-sensitive behavior, this can cause incorrect execution, blocking, or confusing timing behavior.
  • This kind of bug can stay hidden in simple cases, but becomes problematic once tools truly distinguish between sync and async implementations.
  • The good part is that the boundary is small, which makes it a strong candidate for a very focused upstream fix.

Potential upstream direction:

  • The issue is not that a local business tool failed. The issue is that framework-level async dispatch does not preserve async semantics in the no-hooks branch.
  • A small unit test that verifies async functions still use the async entrypoint when hooks are absent would likely make the change easy to review.

Why this matters:

  • The problem is clear.
  • The fix surface is small.
  • The risk is controllable.
  • It should be easy for upstream to accept.

2. Chroma multi-field metadata filters are converted into an incompatible format

Files involved:

  • libs/agno/agno/vectordb/chroma/chromadb.py

Observed issue:

  • The original implementation flattens multiple metadata conditions at the top level.
  • For Chroma, a safer and more compatible format for multiple conditions is to wrap them under $and.

Detailed explanation:

  • Single-field metadata filters may appear to work, but once multiple fields are involved, the generated filter structure can diverge from Chroma's expected query format.
  • This is not just a minor compatibility detail. It can directly affect recall correctness and may cause incorrect matches or no results at all.
  • If agno intends to provide a unified vector DB filtering abstraction, the Chroma adapter needs to map compound filters into the structure Chroma actually expects.
  • This is a classic adapter-layer fix. It is not business-specific and does not require changing the higher-level API.

Potential upstream direction:

  • The focus should be on unstable or incompatible conversion of multi-field metadata filters.
  • A good repro would use a two-field filter such as user_id + session_id or doc_type + source.

Why this matters:

  • It is a standard third-party adapter fix.
  • It has no business-specific behavior.
  • The change is small and testable.

3. Team delegate results do not identify the member clearly in conversation history

Files involved:

  • libs/agno/agno/models/base.py

Observed issue:

  • After delegate_task_to_member results enter multi-turn history, they are often reduced to plain natural-language content with no clear indication of which member produced them.
  • Prefixing the output with something like [member_id] is a general interpretability improvement.

Detailed explanation:

  • In multi-turn team orchestration, delegated outputs get mixed into regular assistant/tool history, and it becomes hard to see which member produced which content.
  • For debugging team behavior, analyzing routing quality, and understanding role specialization, speaker identity is important context.
  • This may not always cause a functional bug, but it significantly reduces session readability and debuggability, especially in teams with multiple specialist members.
  • This is more of an enhancement than a strict bugfix, but it is easy to justify and can likely be implemented in a backward-compatible way.

Potential upstream direction:

  • It is better framed as an improvement to debuggability and readability of delegated member history, not as a deep framework bug.
  • If output-format stability is a concern, this could also be made configurable or preserved in metadata.

Why this matters:

  • It applies broadly to multi-turn team scenarios.
  • The value is easy to explain.
  • The change is local and behaviorally straightforward.

4. Team session history assembly may duplicate member runs

Files involved:

  • libs/agno/agno/session/team.py
  • libs/agno/tests/unit/team/test_team_run_regressions.py

Observed issue:

  • The same member run may appear both in top-level runs and inside member_responses.
  • When building member history, the same run can be collected more than once.

Detailed explanation:

  • During history replay or session restoration, the same member run can be loaded twice, which causes duplicated history and unnecessary context growth.
  • This is not just a logging problem. Duplicate history can influence later model reasoning and make it look as if a member repeated the same conclusion multiple times.
  • If downstream compression, summarization, or replay logic consumes that duplicated history, the noise can propagate and inflate both semantics and token cost.
  • This is a standard aggregation/deduplication bug, which upstream projects usually accept readily once a regression test is included.

Potential upstream direction:

  • A good report would include a minimal session structure where the same run_id appears in two containers and is then added twice during history assembly.
  • Since there is already a regression-test-oriented area for this logic, this is well suited to a "repro + fix + regression test" PR.

Why this matters:

  • It is a general bug in team session aggregation.
  • It has clear test support.

5. CompressionManager is too aggressive and can compress tool results that are still needed immediately

Files involved:

  • libs/agno/agno/compression/manager.py
  • libs/agno/tests/unit/compression/test_compression_manager.py

Observed issue:

  • The original logic mainly triggers compression based on the number of uncompressed tool results.
  • This can compress tool outputs from the most recent user block even when those results are still important for the next reasoning step.

Detailed explanation:

  • In long, tool-heavy conversations, compression can act too early on freshly produced tool results that are still part of the near-term reasoning context.
  • This creates a subtle failure mode: token usage improves, but reasoning quality drops because the most relevant recent tool outputs have already been reduced into overly coarse summaries.
  • The core issue is not only the threshold. It is the lack of a protection strategy for the most recent and still-active context.
  • This is valuable upstream because it is a general context-management problem rather than a business-specific preference.

Potential upstream direction:

  • Rather than upstreaming a project-specific local strategy, it would be better to propose a general capability such as retaining the most recent N tool results or the most recent N user blocks from compression.
  • The framing should focus on preserving near-term reasoning context, not merely "compressing less often".

Why this matters:

  • It is a framework-level context compression problem.
  • It is broadly useful in long and tool-heavy conversations.
  • It can be supported with tests.

6. Compressed history state can be lost across turns or after reload

Files involved:

  • libs/agno/agno/agent/_messages.py
  • libs/agno/agno/agent/_run.py
  • libs/agno/agno/agent/_session.py
  • libs/agno/agno/agent/_storage.py
  • libs/agno/agno/team/_default_tools.py
  • libs/agno/agno/team/_session.py
  • libs/agno/agno/team/_storage.py
  • libs/agno/agno/team/_tools.py
  • libs/agno/agno/session/team.py
  • libs/agno/agno/utils/agent.py
  • libs/agno/tests/unit/agent/test_run_regressions.py
  • libs/agno/tests/unit/team/test_team_run_regressions.py

Observed issue:

  • Historical messages are deep-copied during compression flows, but compression results are not always written back consistently to the original message graph.
  • After reloading a session, compression metadata can also disappear.
  • In team flows, member-history compression can lose linkage inside the parent team session.

Detailed explanation:

  • Within one turn, history may appear successfully compressed, but on the next turn or after a reload it can revert to an uncompressed state or lose consistent compression markers.
  • This suggests the issue is not just in the compression algorithm itself, but in how compression results are attached, propagated, and persisted across message objects and session structures.
  • For agents, this makes compression benefits unstable. For teams, it can also cause parent-session history and member history to diverge.
  • This is high-value work, but it touches a wider area, so it would likely be better to split upstream discussion or PRs into "write-back consistency" and "reload persistence".

Potential upstream direction:

  • It would be better not to start with a giant PR. The core upstream problem is that compression state is not durably preserved.
  • A strong issue report would show a cross-turn repro: compression succeeds in one turn, the session is reloaded, and the compressed state disappears or becomes inconsistent.

Why this matters:

  • It is a valuable general fix.
  • It directly affects whether compression truly remains effective across turns.

7. MySQL learning storage is still a stub, which makes the learning stack incomplete on MySQL

Files involved:

  • libs/agno/agno/db/mysql/mysql.py
  • libs/agno/agno/db/mysql/async_mysql.py
  • libs/agno/agno/db/mysql/schemas.py

Observed issue:

  • The learning-related methods on the MySQL path are still unimplemented in the original code.
  • This prevents features such as session context and learned knowledge from working fully in MySQL-backed deployments.

Detailed explanation:

  • At the product surface, the framework appears to support MySQL as a backend, but the learning stack is not actually complete on that path.
  • As a result, users who choose MySQL as their main storage backend hit feature gaps that cannot be solved by simple configuration changes.
  • From a product-consistency perspective, this is a backend parity gap. From a user perspective, it can easily be misread as "learning is unstable".
  • This should likely be grouped as a dedicated upstream contribution because it spans schema, CRUD behavior, and parity between sync and async implementations.

Potential upstream direction:

  • The framing should be feature parity, not "our project happens to need MySQL".
  • If submitted as a PR, it should ideally cover both sync and async backends and clearly document schema impact.

Why this matters:

  • It is a clear backend parity gap.
  • It is a reasonable upstream capability completion item.

8. Timestamp handling mixes integer timestamps and datetime objects across storage layers

Files involved:

  • libs/agno/agno/db/mysql/mysql.py
  • libs/agno/agno/db/mysql/async_mysql.py
  • libs/agno/agno/db/mysql/schemas.py
  • libs/agno/agno/db/postgres/postgres.py
  • libs/agno/agno/db/postgres/async_postgres.py
  • libs/agno/agno/db/postgres/schemas.py
  • libs/agno/agno/db/schemas/knowledge.py

Observed issue:

  • Multiple created_at/updated_at fields still mix int(time.time()) with datetime.
  • This creates inconsistencies around ORM mapping, serialization, timezone handling, and schema alignment across databases.

Detailed explanation:

  • At the API and model level, fields with the same conceptual meaning do not always share the same type or conversion behavior.
  • In the short term this may just look messy, but over time it increases migration cost, makes serialization less predictable, and turns timezone issues into recurring edge-case bugs.
  • Once learning, knowledge, and session-related modules coexist across databases, inconsistent timestamp semantics become a source of repeated friction.
  • The value of fixing this is real, but it is probably best done incrementally rather than as a single very large normalization PR.

Potential upstream direction:

  • A better strategy is not "normalize every timestamp field in the entire repository at once", but to start with one coherent domain.
  • For example, learning- or knowledge-related tables could be normalized first, with clear schema and serialization behavior.

Why this matters:

  • It is a general compatibility problem.
  • If accepted, it would improve consistency across the storage layer.

9. Session context scope is too coarse and is shared by all agents/teams under the same session_id

Files involved:

  • libs/agno/agno/learn/stores/session_context.py
  • libs/agno/agno/learn/config.py
  • libs/agno/agno/db/mysql/mysql.py
  • libs/agno/agno/db/mysql/async_mysql.py
  • libs/agno/agno/db/mysql/schemas.py

Observed issue:

  • Both the original design notes and implementation indicate that session_context is retrieved by session_id only.
  • agent_id and team_id exist mainly as audit fields and are not part of recall scoping.
  • This means multiple agents, team members, and the team leader under the same session share the same session context entry.

Impact:

  • In multi-agent or team collaboration, different roles should not always share the exact same working context.
  • One agent's temporary plan, partial progress, or local task state can pollute the context seen by the team leader or by other members.
  • When learning extraction runs asynchronously in the background, sharing one record also increases the chance of overwrite-style interference.

Detailed explanation:

  • The core issue is not that shared session context is always wrong. The issue is that there is no more granular scoping option.
  • In a single-agent product, session_id-only scoping may be acceptable. In team, multi-agent, or multi-role systems, however, a session is often only a top-level container and should not automatically imply a single shared working memory.
  • If leader, planner, executor, and reviewer all share one session context entry, local steps, intermediate reasoning, and final coordination notes can become mixed together.
  • In practice, this can turn the learning store from something that helps role separation into something that actively causes cross-role context pollution.

Potential upstream direction:

  • This is better raised first as a design issue or RFC-style discussion around optional scoped session context.
  • A possible direction would be to preserve current behavior by default while optionally supporting recall scope keyed by session_id + agent_id or session_id + team_id.

Why this matters:

  • It is one of the most important design issues we found in the LearningMachine-related stores.
  • The impact becomes significant in multi-agent and team scenarios.

10. User profile and user memory are recalled only by user_id, with no agent/team isolation

Files involved:

  • libs/agno/agno/learn/stores/user_profile.py
  • libs/agno/agno/learn/stores/user_memory.py

Observed issue:

  • The store documentation and recall logic are both explicit: retrieval is scoped by user_id only, while agent_id and team_id are used only for audit metadata.
  • This means multiple agents or teams sharing the same user and database also share the same profile and memory by default.

Impact:

  • For some products, this is a reasonable default.
  • But for multi-tenant, multi-persona, or multi-role agent scenarios, it can be too coarse.
  • At minimum, it is important to make clear that the current implementation is not an agent/team-scoped memory system. It is a user-scoped memory system.

Detailed explanation:

  • There is an important distinction between "shared by default" and "shared with no alternative".
  • Using user_id as the sole recall key makes sense for personal-assistant-style products, but once one user drives multiple role-specific agents, profile and memory can become unintentionally shared across those roles.
  • For example, operations, support, and research agents may all serve the same user, but they may not need identical visibility into the same memory space.
  • Because this touches product philosophy and backward compatibility, it is probably better handled as a design discussion before assuming upstream would want direct scoped-memory changes.

Potential upstream direction:

  • The safest framing is that current user-scoped memory/profile behavior is clear, but optional agent/team-scoped isolation is missing for multi-role systems.
  • It likely makes sense to start with an issue discussion before moving to a code PR.

Why this matters:

  • It is an important semantic fact about the current store design.
  • Even if upstream does not immediately accept scoped recall changes, it is still worth making the limitation explicit in discussion.

Steps to Reproduce

Agent Configuration (if applicable)

No response

Expected Behavior

Actual Behavior

Screenshots or Logs (if applicable)

No response

Environment

-

Possible Solutions (optional)

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions