Replies: 2 comments
-
|
+1 for video generation support! Multi-modal is the future. Current workaround — use as tools: from agno.agent import Agent
from agno.tools import tool
import replicate
@tool
def generate_video(prompt: str, duration: int = 5) -> str:
"""Generate video from text prompt using Runway/Pika/Sora."""
output = replicate.run(
"stability-ai/stable-video-diffusion",
input={"prompt": prompt}
)
return output["url"]
@tool
def generate_video_luma(prompt: str) -> str:
"""Generate video using Luma Dream Machine."""
# Luma API call
...
agent = Agent(
model=model,
tools=[generate_video, generate_video_luma],
instructions=["You can generate videos to explain concepts visually"]
)What native support would look like: from agno.models.video import RunwayModel, SoraModel
agent = Agent(
model=OpenAI(id="gpt-4o"),
video_model=RunwayModel(), # Native video gen
)
response = agent.run(
"Create a 10-second explainer video about quantum computing",
output_type="video"
)Use cases:
We integrate video gen into agent workflows at Revolution AI — native support would be huge for multi-modal agents. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Video generation support would be powerful! At RevolutionAI (https://revolutionai.io) we work with multimodal AI. Current models to consider:
Integration pattern: from agno import Agent, tool
@tool
def generate_video(prompt: str, duration: int = 5) -> str:
# Call video API
response = runway.generate(
prompt=prompt,
duration=duration
)
return response.url
agent = Agent(tools=[generate_video])Use cases:
Would definitely use this! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The future of AI is multi-modal, and Agno is already leading the charge with its lightning-fast, flexible framework for building intelligent agents.
But there’s a gap—one that’s holding Agno back from true dominance: native support for video generation models.
Imagine Agno not just understanding text, images, and audio, but generating dynamic video content on the fly. From real-time video summaries to interactive AI assistants that create visual explanations, the possibilities are endless.
Beta Was this translation helpful? Give feedback.
All reactions