mukundakatta/agent-eval-lab

Generate practical eval scenarios for AI agents and tool-calling systems.

Public
0 runs

Agent Eval Lab

Agent Eval Lab helps developers turn rough agent workflows into testable evaluation scenarios with expected behavior, failure modes, scoring dimensions, and follow-up checks.

Intended use

Use this model page as the public home for a lightweight evaluation surface for AI agents and tool-calling systems.

Example workflows

  • A browser automation agent that books travel and fills web forms
  • A coding agent that edits files, runs tests, and summarizes failures
  • A support triage assistant that classifies tickets and drafts replies

Expected outputs

  • scenario title
  • structured task setup
  • expected agent behavior
  • failure-mode checklist
  • scoring rubric

Status

The public model page is live. Runtime versions can be pushed next with Cog when the implementation is ready.

Model created