TEST_PLAN.md 8.9 KB

ContextScout Test Plan

Test Coverage Analysis

Current Test Status

Test Type Coverage Status Issues
smoke-test.yaml Positive Basic operation ✅ Passing Too simple - doesn't validate output quality
02-discovery-test.yaml Positive Structure discovery ⚠️ Needs validation No assertions on output format
03-search-standards.yaml Positive File search ⚠️ Needs validation No verification of line ranges
04-content-extraction.yaml Positive Content extraction ⚠️ Needs validation No verification of key findings
05-no-context-handling.yaml Negative Empty directory ⚠️ Needs validation Doesn't verify honest reporting

Test Coverage Gaps

Missing Positive Tests:

  • ✅ Verify file paths are exact and valid
  • ✅ Verify line ranges are accurate
  • ✅ Verify priority ratings are appropriate
  • ✅ Verify MVI compliance detection
  • ✅ Verify function-based folder detection
  • ✅ Verify loading order recommendations

Missing Negative Tests:

  • ❌ Invalid path handling (non-existent directories)
  • ❌ Malformed context files (invalid YAML, broken markdown)
  • ❌ Ambiguous search queries
  • ❌ Context files without clear structure
  • ❌ Circular dependencies in context
  • ❌ False positive prevention (claiming files exist that don't)

Test Categories

1. Positive Tests (Happy Path)

Purpose: Verify ContextScout works correctly with valid inputs

Test: Valid Context Discovery

Input: "Find context for code standards" Expected Output:

  • ✅ Returns .opencode/context/core/standards/code.md
  • ✅ Includes line ranges (e.g., "lines 22-27")
  • ✅ Priority: ⭐⭐⭐⭐⭐ (Critical)
  • ✅ Function type: Guide
  • ✅ Key findings: 3-5 specific points
  • ✅ Loading order: "Load this file NOW"

Assertions:

assertions:
  - type: output_contains
    value: ".opencode/context/core/standards/code.md"
  - type: output_contains
    value: "lines"
  - type: output_contains
    value: "⭐⭐⭐⭐⭐"
  - type: output_contains
    value: "Key Findings"

Test: MVI-Aware Prioritization

Input: "Find context files, prioritize by MVI compliance" Expected Output:

  • ✅ Files <200 lines ranked higher
  • ✅ Files with clear sections ranked higher
  • ✅ Files with navigation README ranked higher
  • ✅ Priority ratings reflect MVI compliance

Assertions:

assertions:
  - type: custom
    validator: "verify_mvi_prioritization"
    description: "Files <200 lines should have higher priority"

Test: Function-Based Discovery

Input: "Find examples of how to write tests" Expected Output:

  • ✅ Searches examples/ folder first
  • ✅ Returns example files, not just guides
  • ✅ Identifies function type: "Example"
  • ✅ Provides minimal working code

Assertions:

assertions:
  - type: output_contains
    value: "examples/"
  - type: output_contains
    value: "Type: Example"

2. Negative Tests (Error Handling)

Purpose: Verify ContextScout handles invalid inputs gracefully

Test: Non-Existent Directory

Input: "Find context in /fake/directory/that/does/not/exist" Expected Output:

  • ✅ Reports directory doesn't exist
  • ✅ Doesn't fabricate results
  • ✅ Suggests checking path
  • ✅ No false positives

Assertions:

assertions:
  - type: output_not_contains
    value: "Found context files"
  - type: output_contains
    value: "not found"
  - type: tool_not_called
    tool: "read"
    reason: "Should not attempt to read non-existent files"

Test: Ambiguous Query

Input: "Find stuff" Expected Output:

  • ✅ Asks for clarification
  • ✅ Suggests specific search terms
  • ✅ Doesn't return random files
  • ✅ Provides examples of valid queries

Assertions:

assertions:
  - type: output_contains
    value: "clarify"
  - type: output_contains
    value: "specific"

Test: Malformed Context File

Input: "Find context in directory with broken YAML frontmatter" Expected Output:

  • ✅ Reports file has issues
  • ✅ Attempts to extract what it can
  • ✅ Warns about malformed content
  • ✅ Doesn't crash or hang

Assertions:

assertions:
  - type: output_contains
    value: "malformed"
  - type: no_errors
    description: "Should handle gracefully without crashing"

Test: False Positive Prevention

Input: "Find API documentation" Expected Output:

  • ✅ Only returns files that actually exist
  • ✅ Verifies file paths before reporting
  • ✅ Doesn't hallucinate file names
  • ✅ Uses glob/list to verify existence

Assertions:

assertions:
  - type: all_paths_exist
    description: "Every file path mentioned must actually exist"
  - type: tool_called
    tool: "glob"
    reason: "Must verify files exist before claiming they do"

3. Edge Case Tests

Purpose: Verify ContextScout handles boundary conditions

Test: Empty Context Directory

Input: "Find context in empty .tmp/test-fixtures/empty/" Expected Output:

  • ✅ Reports no context found
  • ✅ Suggests creating context structure
  • ✅ Provides template/example
  • ✅ Honest about lack of results

Status: ✅ Already implemented (05-no-context-handling.yaml)

Test: Very Large Context File (>1000 lines)

Input: "Extract key findings from large context file" Expected Output:

  • ✅ Identifies file is not MVI compliant
  • ✅ Suggests splitting into smaller files
  • ✅ Still extracts key findings
  • ✅ Provides line ranges for sections

Assertions:

assertions:
  - type: output_contains
    value: "MVI"
  - type: output_contains
    value: "lines"

Test: Circular Context Dependencies

Input: "Find context for X which depends on Y which depends on X" Expected Output:

  • ✅ Detects circular dependency
  • ✅ Reports the cycle
  • ✅ Suggests breaking the cycle
  • ✅ Doesn't infinite loop

Assertions:

assertions:
  - type: output_contains
    value: "circular"
  - type: timeout_not_exceeded
    max_duration: 30000

Recommended Test Additions

High Priority (Add These First)

  1. test-06-exact-paths.yaml - Verify file paths are exact and valid
  2. test-07-line-ranges.yaml - Verify line ranges are accurate
  3. test-08-false-positive.yaml - Prevent hallucinated file paths
  4. test-09-invalid-path.yaml - Handle non-existent directories
  5. test-10-ambiguous-query.yaml - Handle vague requests

Medium Priority

  1. test-11-mvi-detection.yaml - Verify MVI compliance detection
  2. test-12-function-folders.yaml - Verify function-based discovery
  3. test-13-priority-ratings.yaml - Verify priority ratings are appropriate
  4. test-14-large-files.yaml - Handle files >200 lines
  5. test-15-malformed-content.yaml - Handle broken YAML/markdown

Low Priority

  1. test-16-circular-deps.yaml - Detect circular dependencies
  2. test-17-performance.yaml - Verify response time <30s
  3. test-18-multiple-matches.yaml - Handle multiple matching files
  4. test-19-no-readme.yaml - Handle directories without README
  5. test-20-integration.yaml - Full workflow test

Test Quality Checklist

For each test, verify:

  • Clear purpose - What specific behavior is being tested?
  • Specific assertions - What exact output is expected?
  • Positive AND negative - Tests both success and failure cases
  • No false positives - Test would fail if agent misbehaves
  • No false negatives - Test wouldn't fail for correct behavior
  • Fast execution - Completes in <30 seconds
  • Deterministic - Same input always produces same result
  • Independent - Doesn't depend on other tests

Current Test Issues

Issue 1: Smoke Test Too Simple

Problem: Current smoke test just checks if agent responds, doesn't validate output quality Fix: Add assertions for expected output format

Issue 2: No Output Validation

Problem: Tests don't verify the actual content of responses Fix: Add assertions section to each test with specific checks

Issue 3: No False Positive Prevention

Problem: Tests don't verify agent isn't hallucinating file paths Fix: Add test that verifies all mentioned paths actually exist

Issue 4: No Negative Tests

Problem: Only 1 negative test (empty directory), need more Fix: Add tests for invalid paths, ambiguous queries, malformed files

Issue 5: No Performance Tests

Problem: No verification that ContextScout responds quickly Fix: Add timeout assertions and performance benchmarks


Next Steps

  1. Review current tests - Analyze what they actually validate
  2. Add assertions - Add specific output validation to existing tests
  3. Create negative tests - Add 5 new negative test cases
  4. Run full suite - Verify all tests pass
  5. Document results - Update README with test coverage

Last Updated: 2026-01-07
Status: Test plan created, implementation pending