Skip to content

Tutorials & Examples

Explore common usage patterns for AgentCover. These examples demonstrate how to move from simple discovery to complex logic verification using real LLM interactions.

Prerequisites

Examples 2 and 3 require a valid OPENAI_API_KEY environment variable because they perform real network calls to verify that AgentCover correctly intercepts live traffic.


Tutorial 1: The Inventory Scan (Zero-Config)

Scenario: You have inherited a codebase ("Legacy App") with no tests and no documentation. You want to know what Prompts and Tools exist in the code.

Strategy: Run agent-cover without writing any specific test logic. The Static Discovery engine will scan your files and build an inventory report.

1. The Codebase

The application defines prompts as global variables and tools using decorators.

examples/01_inventory_discovery/legacy_code.py
"""A legacy application file containing definitions but no tests.

AgentCover will statically discover these items.
"""

from langchain.tools import tool

# 1. Raw String Prompts (Detected by heuristic prefix 'PROMPT_')
PROMPT_SYSTEM = "You are a legacy system maintenance bot."
PROMPT_ERROR = "An error occurred in module {module_name}."


# 2. Tools (Detected by @tool decorator or BaseTool inheritance)
@tool
def reset_database(force: bool = False):
    """Dangerous tool to reset the DB."""
    pass


@tool
def check_disk_space(path: str):
    """Checks available disk space."""
    return "10GB Free"

2. The "Dummy" Test

We create a placeholder test just to trigger the pytest runner.

examples/01_inventory_discovery/test_map.py
"""A generic test file.

Even though the test does nothing, AgentCover's discovery phase will map the 'legacy_code.py' file.
"""


def test_inventory_scan():
    """A mock function."""
    # Pass: We just want the plugin to run its discovery phase
    assert True

3. The Result

Run the following command:

pytest examples/01_inventory_discovery --agent-cov --agent-cov-html=report

Outcome: You receive an HTML report listing definitions identified by the Raw String Scanner and Tool Instrumentor, pinpointing exactly where they are defined in the file system.


Tutorial 2: Full Logic Coverage (LangChain)

Scenario: You are building a Banking Agent. It is critical that the agent handles both BALANCE_SHOWN and TRANSFER_DONE outcomes. You want to ensure your test suite actually triggers the LLM to call the correct tools.

Strategy:

  1. Define the business rules in agent-cover.yaml.
  2. Use a real AgentExecutor with ChatOpenAI.
  3. Write tests that prompt the agent to perform different actions.

1. Configuration (agent-cover.yaml)

We explicitly define the intents we expect to see using DecisionConfig.

examples/02_langchain_complete/agent-cover.yaml
decisions:
  - id: intent_check
    description: Verify the agent correctly identifies user intent
    target_field: intent
    expected_values:
      - CHECK_BALANCE
      - TRANSFER_FUNDS

2. The Agent Logic

This agent uses real Tools. AgentCover tracks their execution only when the LLM effectively calls them.

examples/02_langchain_complete/bank_agent.py
"""Banking Agent implementation using LangChain and OpenAI.

This agent is designed to use tools to fetch balances or transfer funds.
"""

import os

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.tools import tool
from langchain_core.prompts import ChatPromptTemplate

# We use the standard ChatOpenAI model
from langchain_openai import ChatOpenAI

# 1. Tool Definitions
# AgentCover will track these via the @tool decorator.
# They are only recorded if the LLM *decides* to call them inside the loop.


@tool
def get_balance(account_id: str) -> str:
    """Useful to get the balance of a specific account. Returns the amount."""
    # In a real app, this would query a DB.
    return "1000 USD"


@tool
def transfer_funds(amount: int, to_account: str) -> str:
    """Useful to transfer money to another account."""
    return "Success: Funds transferred."


# 2. Agent Setup
def create_bank_agent():
    """Constructs the LangChain Agent.

    Returns:
        AgentExecutor: The runnable agent loop.
    """
    if not os.environ.get("OPENAI_API_KEY"):
        raise ValueError("OPENAI_API_KEY is required for this example.")

    llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

    tools = [get_balance, transfer_funds]

    # We enforce a specific system prompt to guide the output format
    # so we can validate it against agent-cover.yaml
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are a bank assistant. "
                "If you show the balance, end your answer with 'BALANCE_SHOWN'. "
                "If you transfer money, end your answer with 'TRANSFER_DONE'.",
            ),
            ("human", "{input}"),
            ("placeholder", "{agent_scratchpad}"),
        ]
    )

    agent = create_tool_calling_agent(llm, tools, prompt)

    # The AgentExecutor is what manages the context and tool invocation loop.
    # AgentCover instruments this execution flow.
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

3. The Test Suite

We invoke the agent with different inputs. We skip the test if no API Key is found.

examples/02_langchain_complete/test_bank.py
"""Integration test for the Banking Agent.

This test makes REAL calls to OpenAI.
"""

import os

import pytest
from bank_agent import create_bank_agent


# Skip this test if no API key is present
@pytest.mark.skipif(
    not os.environ.get("OPENAI_API_KEY"),
    reason="Requires OpenAI API Key for real LLM execution",
)
def test_banking_intents_coverage():
    """Runs the agent with different inputs to trigger different tools.

    AgentCover will record:
    1. The Prompt Template rendering.
    2. The Tool Execution (get_balance / transfer_funds).
    3. The Decision Output (BALANCE_SHOWN / TRANSFER_DONE).
    """
    agent_executor = create_bank_agent()

    # Scenario 1: User wants to see balance
    # Expected Flow: LLM -> Calls 'get_balance' -> LLM -> Returns "BALANCE_SHOWN"
    result_1 = agent_executor.invoke({"input": "What is my balance for account 123?"})
    assert "BALANCE_SHOWN" in result_1["output"]

    # Scenario 2: User wants to transfer money
    # Expected Flow: LLM -> Calls 'transfer_funds' -> LLM -> Returns "TRANSFER_DONE"
    result_2 = agent_executor.invoke({"input": "Transfer 50 USD to account 456"})
    assert "TRANSFER_DONE" in result_2["output"]

Outcome:

  • Prompt Coverage: 100% (The ChatPromptTemplate is formatted and sent to OpenAI).
  • Tool Coverage: 100% (The LLM decided to call both get_balance and transfer_funds in separate tests).
  • Decision Coverage: 100% (Both BALANCE_SHOWN and TRANSFER_DONE were observed in the output).

Tutorial 3: Framework Agnostic (Raw OpenAI)

Scenario: You don't use LangChain or LlamaIndex. You write raw Python strings and call openai directly.

Strategy: AgentCover's Raw String Scanner detects variables based on naming conventions (e.g., PROMPT_...), and the LLM Instrumentor intercepts the real API payload.

1. The Raw Implementation

We use a standard OpenAI client. Note how prompts are defined as global constants.

examples/03_raw_openai_strings/simple_bot.py
"""A simple bot using the OpenAI Client directly without frameworks."""

import os

from openai import OpenAI

# 1. Raw String Prompts
# AgentCover's Scanner detects these variables by prefix 'PROMPT_'.
# It generates a regex to match the runtime formatted string.
PROMPT_SYSTEM = "You are a concise helper."
PROMPT_USER_TEMPLATE = (
    "Classify this text: '{text}'. Return only 'POSITIVE' or 'NEGATIVE'."
)


class SimpleBot:
    """A simple bot implementation using raw OpenAI calls."""

    def __init__(self):
        """Initialize the bot and check API keys."""
        if not os.environ.get("OPENAI_API_KEY"):
            raise ValueError("OPENAI_API_KEY is required.")
        self.client = OpenAI()

    def classify(self, user_text: str) -> str:
        """Sends a request to OpenAI.

        AgentCover's LLMProviderInstrumentor intercepts the 'client.chat.completions.create' call.
        It checks if the 'content' sent matches PROMPT_USER_TEMPLATE.
        """
        # Format the f-string (AgentCover matches the result against the pattern)
        user_message = PROMPT_USER_TEMPLATE.format(text=user_text)

        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": PROMPT_SYSTEM},
                {"role": "user", "content": user_message},
            ],
            temperature=0,
        )

        return response.choices[0].message.content.strip()

2. The Test

We perform a real classification request. LLMProviderInstrumentor intercepts the network call transparently.

examples/03_raw_openai_strings/test_raw.py
"""Integration test for the Raw OpenAI Bot."""

import os

import pytest
from simple_bot import SimpleBot


@pytest.mark.skipif(
    not os.environ.get("OPENAI_API_KEY"), reason="Requires OpenAI API Key"
)
def test_bot_classification_coverage():
    """This test verifies that AgentCover tracks prompt usage even with raw f-strings."""
    bot = SimpleBot()

    # 1. Positive Case
    # The formatted string "Classify this text: 'I love this'..." is sent to OpenAI.
    # AgentCover intercepts the call, matches it against the PROMPT_USER_TEMPLATE regex,
    # and marks the prompt as "Covered".
    result = bot.classify("I love this product")
    assert "POSITIVE" in result.upper()

    # 2. Negative Case (Just to exercise the code)
    result_neg = bot.classify("This is terrible")
    assert "NEGATIVE" in result_neg.upper()

Outcome: The report will show that PROMPT_USER_TEMPLATE was covered because the text sent to the OpenAI API matched the regex pattern generatedby the Regex Generator from the static string variable.


Tutorial 4: PromptFlow & Parallel Workers

Scenario: You are using Microsoft PromptFlow to orchestrate a complex DAG. You execute batch runs using pf run, which spawns multiple isolated worker processes. You need to ensure all parallel branches and guardrail states are tested.

Strategy: Since PromptFlow workers run in separate processes, standard instrumentation won't work. Use the agent-cover CLI wrapper to inject tracking into every worker and consolidate the results automatically.

1. The Flow Definition

The DAG references Jinja2 templates for prompts and Python files for custom tools.

examples/04_promptflow_complete/flow.dag.yaml
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
environment:
  python_requirements_txt: requirements.txt
inputs:
  chat_history:
    type: list
    is_chat_history: true
    default: []
  question:
    type: string
    is_chat_input: true
    default: "Who are you?"
  tone:
    type: string
    default: "minstrel"
outputs:
  answer:
    type: string
    reference: ${generate_response.output}
    is_chat_output: true
nodes:
- name: build_guardrail_prompt
  type: prompt
  source:
    type: code
    path: guardrail.jinja2
  inputs:
    question: ${inputs.question}
- name: run_guardrail
  type: python
  source:
    type: code
    path: classifier_tool.py
  inputs:
    connection: gemini_connection
    prompt: ${build_guardrail_prompt.output}
  entry: classifier
- name: build_chat_prompt
  type: prompt
  source:
    type: code
    path: chat.jinja2
  inputs:
    chat_history: ${inputs.chat_history}
    question: ${inputs.question}
    tone: ${inputs.tone}
- name: generate_response
  type: python
  source:
    type: code
    path: responder_tool.py
  inputs:
    connection: gemini_connection
    prompt: ${build_chat_prompt.output}
    guardrail_status: ${run_guardrail.output}
  entry: responder

2. The Configuration (agent-cover.yaml)

We want to verify that our guardrail handles both allowed and forbidden inputs.

examples/04_promptflow_complete/agent-cover.yaml
decisions:
  - id: "guardrail_check"
    description: "Verify guardrail states coverage"
    target_field: "guardrail_status"  # Must match the input name or the JSON field in the output
    expected_values: ["IN_SCOPE", "OOS"]

3. Running with the CLI Wrapper

Instead of calling pf run directly, wrap it with the agent-cover run command.

agent-cover run -- pf run create --flow examples/04_promptflow_complete --data examples/04_promptflow_complete/data.jsonl --stream

4. The Consolidated Result

After the batch run completes, the CLI automatically merges the JSON fragments from all workers.

Outcome:

  • Multi-process Aggregation: Even though workers are killed abruptly by PromptFlow, their coverage data is preserved in a single coverage.xml.
  • Jinja2 Coverage: The scanner correctly maps runtime template hashes to files like guardrail.jinja2.
  • Full Decision Coverage: Both IN_SCOPE and OOS values are captured, ensuring your guardrail logic is fully verified across the entire dataset.