AgentCover Documentation
AgentCover is a Python library and pytest plugin designed to bring observability and quality assurance to LLM-based applications.
While traditional tools (like coverage.py) measure executed Python lines, AgentCover tracks the Logical Coverage of your Agent. It verifies that your test suite adequately exercises the agent's actual capabilities: its prompts, its tools, and its business decision branches.
🔍 Core Concepts: How It Works
AgentCover operates on two layers managed by the AgentCoverage Manager. This dual approach ensures that you know exactly what code exists and what part of it was actually used during testing.
Before tests run, AgentCover scans your codebase to build an inventory of "Logical Assets":
- Prompt Templates: Automatically discovers LangChain/LlamaIndex objects and Jinja2 templates (PromptFlow).
- Raw String Prompts: Uses Heuristics to find global variables (e.g.,
PROMPT_SYSTEM = "..."). -
Data Structures (Auto-Discovery): If you use Pydantic models for your agent's structured output, AgentCover automatically generates "Virtual Decisions" coverage rules. It scans for:
EnumfieldsLiteraltypesboolfields
Example:
AgentCover implicitly creates a rule expecting both"SPAM"and"HAM"to appear in your test outputs.
2. Runtime Instrumentation (The Verification)
During test execution, AgentCover hooks into your agent's lifecycle using BaseInstrumentors:
- Tool Execution: Tracks if the LLM actually invoked its available tools (checked against the AgentContext).
- Output Analysis: Intercepts LLM responses, handles JSON-in-Markdown parsing, and validates data against your rules using the OutputAnalyzer.
🚀 Key Coverage Metrics
| Metric | What it measures |
|---|---|
| 🧠 Decision Coverage | Ensures the LLM outputs every expected variation of a field (e.g., all possible Intents or Status Codes) defined in your Configuration. |
| 📝 Prompt Coverage | Reports which specific prompt templates were formatted and sent to an LLM versus which remain "dead code". |
| 🛠️ Tool Coverage | Verifies that every tool defined in your agent is actually successfully called by the LLM during the test run. |
⚡ Quick Start
1. Installation
2. Configuration
Create an agent-cover.yaml file in your project root to define your Business Logic requirements.
decisions:
- id: intent_classification
description: Ensure the agent classifies all supported intents.
target_field: intent
expected_values: ["SALES", "SUPPORT", "BILLING"]
3. Usage with Pytest
Enable the Pytest Plugin via the CLI:
AgentCover will automatically scan your code, instrument LLM providers, analyze outputs, and generate a report in coverage_report/index.html.
📖 Example Scenario: Decision Coverage
Imagine an agent returning a structured response. Standard coverage says the code "ran", but AgentCover tells you if the logic was covered.
Code:
The Gap:
If your tests only ever produce "SUCCESS", AgentCover will report 33% Decision Coverage and highlight that the FAILURE and RETRY branches of your agent's potential behavior have never been exercised.
📚 Explore the Guide
- User Guide & Integrations: Detailed instructions for Pytest, CLI, and SDK usage with supported frameworks (LangChain, LlamaIndex, Promptflow, etc.).
- Tutorials: Step-by-step examples from zero-config inventory scans to complex multi-process aggregation.
- API Reference: Deep dive into the internal modules and extension points.