Skip to content

AgentCover Documentation

AgentCover is a Python library and pytest plugin designed to bring observability and quality assurance to LLM-based applications.

While traditional tools (like coverage.py) measure executed Python lines, AgentCover tracks the Logical Coverage of your Agent. It verifies that your test suite adequately exercises the agent's actual capabilities: its prompts, its tools, and its business decision branches.


🔍 Core Concepts: How It Works

AgentCover operates on two layers managed by the AgentCoverage Manager. This dual approach ensures that you know exactly what code exists and what part of it was actually used during testing.

Before tests run, AgentCover scans your codebase to build an inventory of "Logical Assets":

  • Prompt Templates: Automatically discovers LangChain/LlamaIndex objects and Jinja2 templates (PromptFlow).
  • Raw String Prompts: Uses Heuristics to find global variables (e.g., PROMPT_SYSTEM = "...").
  • Data Structures (Auto-Discovery): If you use Pydantic models for your agent's structured output, AgentCover automatically generates "Virtual Decisions" coverage rules. It scans for:

    • Enum fields
    • Literal types
    • bool fields

    Example:

    class Classification(BaseModel):
        label: Literal["SPAM", "HAM"]
    
    AgentCover implicitly creates a rule expecting both "SPAM" and "HAM" to appear in your test outputs.

2. Runtime Instrumentation (The Verification)

During test execution, AgentCover hooks into your agent's lifecycle using BaseInstrumentors:

  • Tool Execution: Tracks if the LLM actually invoked its available tools (checked against the AgentContext).
  • Output Analysis: Intercepts LLM responses, handles JSON-in-Markdown parsing, and validates data against your rules using the OutputAnalyzer.

🚀 Key Coverage Metrics

Metric What it measures
🧠 Decision Coverage Ensures the LLM outputs every expected variation of a field (e.g., all possible Intents or Status Codes) defined in your Configuration.
📝 Prompt Coverage Reports which specific prompt templates were formatted and sent to an LLM versus which remain "dead code".
🛠️ Tool Coverage Verifies that every tool defined in your agent is actually successfully called by the LLM during the test run.

⚡ Quick Start

1. Installation

pip install agent-cover

2. Configuration

Create an agent-cover.yaml file in your project root to define your Business Logic requirements.

decisions:
  - id: intent_classification
    description: Ensure the agent classifies all supported intents.
    target_field: intent
    expected_values: ["SALES", "SUPPORT", "BILLING"]

3. Usage with Pytest

Enable the Pytest Plugin via the CLI:

pytest --agent-cov --agent-cov-html=coverage_report

AgentCover will automatically scan your code, instrument LLM providers, analyze outputs, and generate a report in coverage_report/index.html.


📖 Example Scenario: Decision Coverage

Imagine an agent returning a structured response. Standard coverage says the code "ran", but AgentCover tells you if the logic was covered.

Code:

class Response(BaseModel):
    status: Literal["SUCCESS", "FAILURE", "RETRY"]

The Gap: If your tests only ever produce "SUCCESS", AgentCover will report 33% Decision Coverage and highlight that the FAILURE and RETRY branches of your agent's potential behavior have never been exercised.


📚 Explore the Guide

  • User Guide & Integrations: Detailed instructions for Pytest, CLI, and SDK usage with supported frameworks (LangChain, LlamaIndex, Promptflow, etc.).
  • Tutorials: Step-by-step examples from zero-config inventory scans to complex multi-process aggregation.
  • API Reference: Deep dive into the internal modules and extension points.