Alvin Unreal 2 months ago
parent
commit
8e67a3e082
5 changed files with 343 additions and 83 deletions
  1. 5 2
      README.md
  2. 219 10
      cartography.md
  3. 98 57
      scripts/cartography.ts
  4. 5 8
      src/tools/cartography/tool.ts
  5. 16 6
      src/tools/skill/builtin.ts

+ 5 - 2
README.md

@@ -703,7 +703,7 @@ Use after major refactors or before finalizing PRs. Identifies unnecessary compl
 **Codebase mapping and structure documentation.**
 
 - **Hierarchical Mapping**: Generate `codemap.md` files at each folder level to document code organization.
-- **Hash-Based Change Detection**: Only re-maps files that have changed since last run.
+- **Hash-Based Change Detection**: Uses `.codemap.json` per folder and only re-maps files that have changed since last run.
 - **Flow Documentation**: Captures purpose, exports, dependencies, and data flows for each file.
 - **Parallel Exploration**: Uses multiple Explorer agents to map large codebases efficiently.
 
@@ -715,8 +715,11 @@ cartography scan <folder> --extensions ts,tsx,js
 # Calculate hashes
 cartography hash <folder> --extensions ts,tsx,js
 
-# Generate/update codemap.md
+# Generate/update .codemap.json
 cartography update <folder> --extensions ts,tsx,js
+
+# Report changed files without writing hashes
+cartography changes <folder> --extensions ts,tsx,js
 ```
 
 See [cartography.md](cartography.md) for detailed design documentation.

+ 219 - 10
cartography.md

@@ -10,14 +10,14 @@ Cartography operates through an orchestrated "bottom-up" analysis pattern, combi
 A lightweight utility designed for the Orchestrator to handle deterministic file operations.
 - **Scanning**: Discovers directory structures while respecting `.gitignore` and default excludes (node_modules, .git, etc.).
 - **Hashing**: Calculates MD5 hashes for individual files and a composite "Folder Hash" (hash of all valid file hashes in that directory).
-- **Compact Frontmatter**: Manages a minimal YAML block in `codemap.md` to track state:
-  ```yaml
-  ---
-  h: [folder_hash]
-  f: [{p: path, h: file_hash}, ...]
-  ---
+- **Hash File**: Manages a minimal `.codemap.json` file to track state:
+  ```json
+  {
+    "h": "[folder_hash]",
+    "f": [{"p": "path", "h": "file_hash"}]
+  }
   ```
-- **Lifecycle**: If `codemap.md` doesn't exist, it scaffolds it. If it exists but hashes match, it skips processing.
+- **Lifecycle**: If `.codemap.json` doesn't exist, it scaffolds it. If it exists but hashes match, it skips processing.
 
 ### 2. Orchestration Strategy
 The Orchestrator acts as the "Surveyor General," determining the scope and sequence of the map.
@@ -40,7 +40,7 @@ Explorers are tasked with generating the human/AI-readable body of the `codemap.
 ## 🔄 Operational Workflow
 
 1.  **Discovery Phase**: Orchestrator runs the helper script to scan the root and identifies "High Importance" directories.
-2.  **Initial Hash Check**: The script identifies which folders are "Dirty" (hash mismatch or missing `codemap.md`).
+2.  **Initial Hash Check**: The script identifies which folders are "Dirty" (hash mismatch or missing `.codemap.json`).
 3.  **Leaf-Node Analysis**: Explorers are dispatched to the deepest sub-folders first.
 4.  **Incremental Update**: 
     - If a file hash changes, the Explorer re-analyzes only that file and updates the Folder Summary.
@@ -48,7 +48,7 @@ Explorers are tasked with generating the human/AI-readable body of the `codemap.
 5.  **Hierarchy Assembly**: As sub-folders finish, parent Explorers synthesize those results into higher-level summaries until the Root Codemap is reached.
 
 ## 🤖 LLM Prompting Goal
-The resulting `codemap.md` files serve as a "Pre-flight Checklist" for any future agent task. Instead of reading 100 files, an agent reads 1-5 `codemap.md` files to understand exactly where logic lives and how systems interact.
+The resulting `codemap.md` files serve as a "Pre-flight Checklist" for any future agent task. Instead of reading 100 files, an agent reads 1-5 `codemap.md` files to understand exactly where logic lives and how systems interact, while `.codemap.json` tracks hash state.
 
 ---
 
@@ -73,7 +73,216 @@ The resulting `codemap.md` files serve as a "Pre-flight Checklist" for any futur
 **A:** One `codemap.md` per folder. Sub-folders must be mapped before their parents so the parent can synthesize the sub-folder's high-level purpose into its own map.
 
 **Q: What is the script's specific responsibility?**
-**A:** The script is deterministic. It calculates hashes, manages the compact frontmatter, and scaffolds the file. It *never* generates the descriptive body; that is reserved for the Explorer agents.
+**A:** The script is deterministic. It calculates hashes, manages `.codemap.json`, and scaffolds hash state. It *never* generates the descriptive body; that is reserved for the Explorer agents.
 
 **Q: How is parallelism handled?**
 **A:** Explorers run in parallel for all "leaf" folders (folders with no sub-folders). Once a layer is complete, the Orchestrator moves up the tree.
+
+---
+
+## 🗣️ User Interaction Flow
+
+### Step 1: User Requests Codemaps
+```
+User: "I need codemaps for this codebase so I can understand the architecture."
+```
+
+### Step 2: Orchestrator Analysis Sequence
+
+**2.1 - Discovery Phase**
+```
+Orchestrator calls: cartography scan {folder} --extensions {exts}
+Response: { folder: ".", files: ["src/index.ts", "src/config.ts", ...] }
+```
+
+**2.2 - Importance Assessment**
+Orchestrator analyzes the file structure and determines:
+- **High Priority**: `src/`, `agents/`, `tools/`
+- **Medium Priority**: `cli/`, `config/`, `utils/`
+- **Low Priority**: `tests/`, `docs/`, scripts that aren't business logic
+
+**2.3 - Extension Selection**
+Based on project type, Orchestrator decides which extensions to map:
+```typescript
+// Example for TypeScript project
+extensions: ["ts", "tsx"]
+// Example for Python project
+extensions: ["py"]
+```
+
+### Step 3: Sequential Mapping (Bottom-Up)
+
+**3.1 - Calculate Initial Hashes**
+```
+Orchestrator calls: cartography hash {folder} --extensions {exts}
+Response: {
+  folderHash: "abc123",
+  files: {
+    "src/index.ts": "def456",
+    "src/config.ts": "ghi789"
+  }
+}
+```
+
+**3.2 - Leaf Folder First**
+```
+Orchestrator: "Mapping leaf folders in parallel..."
+
+For each leaf folder (no subfolders):
+  1. Orchestrator spawns Explorer agent with cartography skill
+  2. Explorer analyzes folder using cartography skill template
+  3. Explorer generates codemap.md body content
+  4. Helper script updates .codemap.json with hashes
+```
+
+**3.3 - Parent Folders**
+```
+Orchestrator: "All subfolders complete. Now mapping parents..."
+
+For each parent folder:
+  1. Orchestrator spawns Explorer agent
+  2. Explorer reads subfolder codemaps to understand child purposes
+  3. Explorer synthesizes subfolder summaries into parent summary
+  4. Explorer documents parent's direct files
+  5. Helper script updates .codemap.json
+```
+
+### Step 4: Completion Report
+
+```
+Orchestrator: "✓ Codemap generation complete!
+
+Summary:
+- Folders mapped: 12
+- Files analyzed: 87
+- Codemaps generated: 12
+
+Key findings:
+- Main entry point: src/index.ts (orchestrates all agents)
+- Agent logic: src/agents/ (6 agent configurations)
+- Tool implementations: src/tools/ (background, LSP, skill, grep)
+- Config management: src/config/schema.ts
+
+Next steps:
+- Review codemap.md at each folder level
+- Ask specific questions about architecture or flows
+- Use codemaps as context prep for future tasks"
+```
+
+### Step 5: Future Use (Incremental Updates)
+
+```
+User: "I made some changes. Update the codemaps."
+
+Orchestrator: "Checking for changes..."
+
+Orchestrator calls: cartography changes src --extensions ts
+
+Response: {
+  folder: "src",
+  fileCount: 45,
+  folderHash: "abc123",
+  changedFiles: ["src/agents/orchestrator.ts"],
+  hasChanges: true
+}
+
+Orchestrator: "Only 1 file changed. Re-mapping src folder..."
+```
+
+---
+
+## 💬 Example Conversation
+
+```
+User: Generate codemaps for this codebase. Focus on the core architecture.
+
+Orchestrator: I'll scan the codebase and generate hierarchical codemaps.
+
+[Scanning...]
+Found main directories:
+- src/ (main source code)
+- cli/ (CLI tools)
+- tests/ (test files) [skipping]
+
+I'll prioritize mapping src/ first. Let me start by scanning its structure...
+
+[cartography tool call: scan src --extensions ts,tsx]
+
+Found 89 TypeScript files in src/.
+
+Analyzing structure:
+- src/agents/ (6 agent configs)
+- src/tools/ (background, LSP, skill tools)
+- src/config/ (schema and config parsing)
+- src/utils/ (utilities)
+
+I'll map this bottom-up, starting with leaf folders and working up to the root.
+
+[Spawning Explorers for leaf folders...]
+✓ src/agents/ codemap.md generated
+✓ src/tools/lsp/ codemap.md generated
+✓ src/tools/skill/ codemap.md generated
+✓ src/config/ codemap.md generated
+✓ src/utils/ codemap.md generated
+
+Now mapping parent folders...
+
+✓ src/tools/ codemap.md generated
+✓ src/ codemap.md generated
+
+Codemap generation complete! Here's the architecture overview:
+
+**Main Entry Point**: src/index.ts
+  - Plugin initialization and tool registration
+  - Agent configuration setup
+
+**Agent Layer**: src/agents/
+  - 6 specialized agents (orchestrator, explorer, oracle, librarian, designer, fixer)
+  - Each with specific capabilities and model mappings
+
+**Tools Layer**: src/tools/
+  - Background task management
+  - LSP integration (definition, references, diagnostics)
+  - Skill system
+  - Code search (grep, ast-grep)
+
+**Configuration**: src/config/
+  - Schema definitions
+  - TMux integration
+  - Preset management
+
+You can now:
+1. Review codemap.md in any folder for detailed documentation
+2. Ask me specific questions about architecture
+3. Use these codemaps as context for development tasks
+```
+
+---
+
+## 🔄 Incremental Update Flow
+
+When the user makes changes:
+
+```
+User: I refactored the orchestrator agent. Update the codemaps.
+
+Orchestrator: Checking what changed...
+
+[cartography tool call: update src/agents --extensions ts]
+
+Response: {
+  updated: true,
+  changedFiles: ["src/agents/orchestrator.ts"]
+}
+
+Orchestrator: I see src/agents/orchestrator.ts changed. I'll re-analyze just this folder.
+
+[Spawning Explorer for src/agents/]
+
+Explorer: Updated codemap.md for src/agents/
+- Re-analyzed orchestrator.ts
+- Updated purpose, exports, dependencies, flows
+- Parent folder (src/) summary still valid (no other major changes)
+
+✓ Codemap updated! The orchestrator now uses the new background task system.
+```

+ 98 - 57
scripts/cartography.ts

@@ -3,14 +3,13 @@ import { existsSync, readdirSync, readFileSync, writeFileSync } from 'node:fs';
 import { join, resolve } from 'node:path';
 import { createMD5, md5 } from 'hash-wasm';
 import ignore from 'ignore';
-import { parse, stringify } from 'yaml';
 
 interface FileEntry {
   p: string;
   h: string;
 }
 
-interface Frontmatter {
+interface CodemapData {
   h: string;
   f: FileEntry[];
 }
@@ -118,33 +117,48 @@ async function calculateFolderHash(
   return hasher.digest();
 }
 
-interface ParsedFrontmatter {
-  frontmatter: Frontmatter | null;
-  body: string;
-}
-
-function parseFrontmatter(content: string): ParsedFrontmatter {
-  const match = content.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/);
-
-  if (!match) {
-    return { frontmatter: null, body: content };
+function readCodemapData(codemapPath: string): CodemapData | null {
+  if (!existsSync(codemapPath)) {
+    return null;
   }
 
   try {
-    const frontmatter = parse(match[1]) as Frontmatter;
-    return { frontmatter, body: match[2] };
+    const content = readFileSync(codemapPath, 'utf-8');
+    return JSON.parse(content) as CodemapData;
   } catch {
-    return { frontmatter: null, body: content };
+    return null;
   }
 }
 
-function formatFrontmatter(frontmatter: Frontmatter): string {
-  return `---
-h: ${frontmatter.h}
-f:
-${frontmatter.f.map((f) => `  - p: ${f.p}\n    h: ${f.h}`).join('\n')}
----
-`;
+function writeCodemapData(codemapPath: string, data: CodemapData): void {
+  const content = `${JSON.stringify(data, null, 2)}\n`;
+  writeFileSync(codemapPath, content, 'utf-8');
+}
+
+function diffFiles(
+  currentHashes: Map<string, string>,
+  previous: CodemapData | null,
+): string[] {
+  if (!previous) {
+    return Array.from(currentHashes.keys()).sort((a, b) => a.localeCompare(b));
+  }
+
+  const oldHashes = new Map(previous.f.map((f) => [f.p, f.h]));
+  const changed = new Set<string>();
+
+  for (const [path, hash] of currentHashes) {
+    if (oldHashes.get(path) !== hash) {
+      changed.add(path);
+    }
+  }
+
+  for (const path of oldHashes.keys()) {
+    if (!currentHashes.has(path)) {
+      changed.add(path);
+    }
+  }
+
+  return Array.from(changed).sort((a, b) => a.localeCompare(b));
 }
 
 async function updateCodemap(
@@ -156,58 +170,67 @@ async function updateCodemap(
   const fileHashes = await calculateHashes(folder, files);
   const folderHash = await calculateFolderHash(fileHashes);
 
-  const codemapPath = join(folder, 'codemap.md');
-  let body = '';
-  let changedFiles: string[] = [];
-
-  if (existsSync(codemapPath)) {
-    const content = readFileSync(codemapPath, 'utf-8');
-    const { frontmatter, body: existingBody } = parseFrontmatter(content);
-
-    if (frontmatter?.h === folderHash) {
-      return { updated: false, fileCount: files.length, changedFiles: [] };
-    }
+  const codemapPath = join(folder, '.codemap.json');
+  const existing = readCodemapData(codemapPath);
 
-    body = existingBody;
-
-    if (frontmatter) {
-      const oldHashes = new Map(frontmatter.f.map((f) => [f.p, f.h]));
-
-      for (const [path, hash] of fileHashes) {
-        if (oldHashes.get(path) !== hash) {
-          changedFiles.push(path);
-        }
-      }
-    } else {
-      changedFiles = files;
-    }
-  } else {
-    changedFiles = files;
+  if (existing?.h === folderHash) {
+    return { updated: false, fileCount: files.length, changedFiles: [] };
   }
 
-  const frontmatter: Frontmatter = {
+  const changedFiles = diffFiles(fileHashes, existing);
+  const data: CodemapData = {
     h: folderHash,
     f: files.map((p) => ({ p, h: fileHashes.get(p)! })),
   };
 
-  const content = formatFrontmatter(frontmatter) + body;
-  writeFileSync(codemapPath, content, 'utf-8');
+  writeCodemapData(codemapPath, data);
 
   return { updated: true, fileCount: files.length, changedFiles };
 }
 
+async function getChanges(
+  folder: string,
+  extensions: string[],
+): Promise<{
+  fileCount: number;
+  folderHash: string;
+  changedFiles: string[];
+}> {
+  const ignorer = parseGitignore(folder);
+  const files = getFiles(folder, extensions, ignorer);
+  const fileHashes = await calculateHashes(folder, files);
+  const folderHash = await calculateFolderHash(fileHashes);
+  const codemapPath = join(folder, '.codemap.json');
+  const existing = readCodemapData(codemapPath);
+  const changedFiles = diffFiles(fileHashes, existing);
+
+  return {
+    fileCount: files.length,
+    folderHash,
+    changedFiles,
+  };
+}
+
 async function main() {
   const command = process.argv[2];
   const folderArg = process.argv[3];
   const folder = folderArg ? resolve(folderArg) : process.cwd();
 
   const extArg = process.argv.find((a) => a.startsWith('--extensions'));
-  const extensions = extArg
-    ? extArg
-        .split('=')[1]
+  let extensions: string[];
+
+  if (extArg) {
+    const extList = extArg.split('=')[1];
+    if (extList) {
+      extensions = extList
         .split(',')
-        .map((e) => '.' + e.trim().replace(/^\./, ''))
-    : ['.ts', '.tsx', '.js', '.jsx', '.py', '.go', '.rs'];
+        .map((e) => '.' + e.trim().replace(/^\./, '')); // 预先计算点号前缀
+    } else {
+      extensions = ['.ts', '.tsx', '.js', '.jsx', '.py', '.go', '.rs'];
+    }
+  } else {
+    extensions = ['.ts', '.tsx', '.js', '.jsx', '.py', '.go', '.rs'];
+  }
 
   switch (command) {
     case 'scan': {
@@ -266,9 +289,27 @@ async function main() {
       break;
     }
 
+    case 'changes': {
+      const result = await getChanges(folder, extensions);
+      console.log(
+        JSON.stringify(
+          {
+            folder,
+            fileCount: result.fileCount,
+            folderHash: result.folderHash,
+            changedFiles: result.changedFiles,
+            hasChanges: result.changedFiles.length > 0,
+          },
+          null,
+          2,
+        ),
+      );
+      break;
+    }
+
     default:
       console.error(
-        'Usage: cartography <scan|hash|update> [folder] [--extensions ts,tsx,js]',
+        'Usage: cartography <scan|hash|update|changes> [folder] [--extensions ts,tsx,js]',
       );
       process.exit(1);
   }

+ 5 - 8
src/tools/cartography/tool.ts

@@ -1,5 +1,4 @@
 import { join } from 'node:path';
-import { fileURLToPath } from 'node:url';
 import {
   type PluginInput,
   type ToolDefinition,
@@ -9,12 +8,12 @@ import {
 export function createCartographyTool(ctx: PluginInput): ToolDefinition {
   return tool({
     description:
-      'Cartography helper script for codebase mapping. Use for directory scanning, hash calculation, and codemap.md generation.',
+      'Cartography helper script for codebase mapping. Use for directory scanning, hash calculation, and .codemap.json updates.',
     args: {
       command: tool.schema
-        .enum(['scan', 'hash', 'update'])
+        .enum(['scan', 'hash', 'update', 'changes'])
         .describe(
-          'Command to run: scan (list files), hash (calculate hashes), update (generate/update codemap.md)',
+          'Command to run: scan (list files), hash (calculate hashes), update (update .codemap.json), changes (report changes without writing)',
         ),
       folder: tool.schema
         .string()
@@ -30,10 +29,8 @@ export function createCartographyTool(ctx: PluginInput): ToolDefinition {
     execute: async (args, toolContext) => {
       const sessionDir = await getSessionDirectory(ctx, toolContext);
 
-      const scriptPath = join(
-        fileURLToPath(import.meta.url),
-        '../../../scripts/cartography.ts',
-      );
+      // Use ctx.directory (plugin project dir) to find the script
+      const scriptPath = join(ctx.directory, 'scripts/cartography.ts');
 
       const extensions = (args.extensions as string) || 'ts,tsx,js,jsx';
       const commandArgs = [

+ 16 - 6
src/tools/skill/builtin.ts

@@ -153,11 +153,21 @@ const CARTOGRAPHY_TEMPLATE = `# Cartography Skill
 
 You are a code cartographer. Your mission is to create structured codemaps that help LLMs understand codebase structure and flows.
 
-## Your Task
+## Orchestrator Workflow
 
-Generate a \`codemap.md\` file for the assigned folder that documents:
+When the user asks for codemaps or updates, you orchestrate the workflow:
+- Call the \`cartography\` tool with \`scan\` to understand folder structure and decide priority folders and extensions.
+- For each target folder, run \`cartography update <folder> --extensions ...\`.
+- If \`updated: false\`, skip analysis for that folder.
+- If \`updated: true\`, use \`changedFiles\` to decide which files need re-analysis.
+- Dispatch Explorer agents to update the body content of \`codemap.md\` (leaf folders first, then parents).
+- After Explorer updates, re-run \`cartography update\` to refresh hashes if needed.
+
+## Explorer Task
+
+Generate or update the \`codemap.md\` body for the assigned folder:
 - **Purpose**: What this folder contains and its role in the project
-- **Per-file analysis**: For each file, document:
+- **Per-file analysis**: For each file (especially \`changedFiles\`), document:
   - Purpose (1-2 sentences)
   - Key exports (main functions/classes/components)
   - Dependencies (imports from other project files)
@@ -194,11 +204,11 @@ Use this structure:
 - Avoid listing function parameters (they change often)
 - Document flows and relationships, not signatures
 - Be concise but informative
-- Reference the frontmatter hashes for change tracking
+- Reference the \`.codemap.json\` hashes for change tracking
 
-## Frontmatter
+## Hash Storage
 
-The helper script manages frontmatter with hashes. You only update the body content when needed. Check the frontmatter to see which files have changed since the last update.
+The helper script manages hashes in \`.codemap.json\`. You only update the body content when needed. Check \`.codemap.json\` or the \`changedFiles\` list to see which files changed since the last update.
 `;
 
 const PLAYWRIGHT_TEMPLATE = `# Playwright Browser Automation Skill