Browse Source

feat: Add web fetching hierarchy, Nushell docs, and test suite

- Add rules/cli-tools.md with modern CLI tool preferences
- Add web fetching hierarchy: WebFetch → Jina Reader → firecrawl → agent
- Add tests 10-13 for web fetching in CLITOOLS_TEST.md
- Add Nushell as experimental/future option in tools/README.md
- Update README.md with tools/, rules/, and web fetching docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
0xDarkMatter 4 months ago
parent
commit
587a67dca1
4 changed files with 343 additions and 2 deletions
  1. 23 0
      README.md
  2. 203 0
      rules/cli-tools.md
  3. 18 2
      tests/CLITOOLS_TEST.md
  4. 99 0
      tools/README.md

+ 23 - 0
README.md

@@ -9,6 +9,9 @@ claude-mods/
 ├── commands/           # Slash commands
 ├── skills/             # Custom skills
 ├── agents/             # Custom subagents
+├── tools/              # Modern CLI toolkit (install scripts, docs)
+├── rules/              # Claude Code rules (cli-tools.md)
+├── tests/              # Test suites
 ├── install.sh          # Linux/macOS installer
 └── install.ps1         # Windows installer
 ```
@@ -103,6 +106,26 @@ Then symlink or copy to your Claude directories:
 | [wrangler-expert](agents/wrangler-expert.md) | Cloudflare Workers deployment, wrangler.toml |
 | [claude-architect](agents/claude-architect.md) | Claude Code architecture, extensions, MCP, plugins, debugging |
 
+### Tools & Rules
+
+| Resource | Description |
+|----------|-------------|
+| [tools/](tools/) | Modern CLI toolkit - token-efficient replacements for legacy commands |
+| [rules/cli-tools.md](rules/cli-tools.md) | Tool preference rules (fd, rg, eza, bat, etc.) |
+
+#### Web Fetching Hierarchy
+
+When fetching web content, tools are used in this order:
+
+| Priority | Tool | When to Use |
+|----------|------|-------------|
+| 1 | `WebFetch` | First attempt - fast, built-in |
+| 2 | `r.jina.ai/URL` | JS-rendered pages, PDFs, cleaner extraction |
+| 3 | `firecrawl <url>` | Anti-bot bypass, blocked sites (403, Cloudflare) |
+| 4 | `firecrawl-expert` agent | Complex scraping, structured extraction |
+
+See [tools/README.md](tools/README.md) for full documentation and install scripts.
+
 ## Testing & Validation
 
 Validate all extensions before committing:

+ 203 - 0
rules/cli-tools.md

@@ -0,0 +1,203 @@
+# CLI Tool Preferences (dev-shell-tools)
+
+ALWAYS prefer modern CLI tools over traditional alternatives. These are pre-approved in permissions.
+
+## File Search & Navigation
+
+| Instead of | Use | Why |
+|------------|-----|-----|
+| `find` | `fd` | 5x faster, simpler syntax, respects .gitignore |
+| `grep` | `rg` (ripgrep) | 10x faster, respects .gitignore, better defaults |
+| `ls` | `eza` | Git status, icons, tree view built-in |
+| `cat` | `bat` | Syntax highlighting, line numbers, git integration |
+| `cd` + manual | `z`/`zoxide` | Jump to frecent directories |
+| `tree` | `broot` or `eza --tree` | Interactive, filterable |
+
+### Examples
+
+```bash
+# Find files (use fd, not find)
+fd "\.ts$"                    # Find TypeScript files
+fd -e py                      # Find by extension
+
+# Search content (use rg, not grep)
+rg "TODO"                     # Search for TODO
+rg -t ts "function"           # Search in TypeScript files
+
+# List files (use eza, not ls)
+eza -la --git                 # List with git status
+eza --tree --level=2          # Tree view
+
+# View files (use bat, not cat)
+bat src/index.ts              # Syntax highlighted
+```
+
+## Data Processing
+
+| Instead of | Use | Why |
+|------------|-----|-----|
+| `sed` | `sd` | Simpler syntax, no escaping headaches |
+| Manual JSON | `jq` | Structured queries, transformations |
+| Manual YAML | `yq` | Same as jq but for YAML/TOML |
+
+```bash
+# Find and replace (use sd, not sed)
+sd 'oldText' 'newText' file.txt
+
+# JSON processing
+jq '.dependencies | keys' package.json
+
+# YAML processing
+yq '.services | keys' docker-compose.yml
+```
+
+## Git Operations
+
+| Instead of | Use | Why |
+|------------|-----|-----|
+| `git diff` | `delta` or `difft` | Syntax highlighting, side-by-side |
+| `git log/status/add` | `lazygit` | Full TUI, faster workflow |
+| GitHub web | `gh` | CLI for PRs, issues, actions |
+
+```bash
+# Better diffs
+git diff | delta              # Syntax highlighted diff
+difft file1.ts file2.ts       # Semantic AST diff
+
+# GitHub operations
+gh pr create                  # Create PR from CLI
+gh pr list                    # List PRs
+```
+
+## Code Analysis
+
+| Task | Tool |
+|------|------|
+| Line counts | `tokei` |
+| AST search | `ast-grep` / `sg` |
+| Benchmarks | `hyperfine` |
+
+```bash
+# Count lines by language
+tokei --compact
+
+# Search by AST pattern (find all console.log calls)
+sg -p 'console.log($$$)' -l js
+
+# Benchmark commands
+hyperfine 'fd . -e ts' 'find . -name "*.ts"'
+```
+
+## System Monitoring
+
+| Instead of | Use | Why |
+|------------|-----|-----|
+| `du -h` | `dust` | Visual tree sorted by size |
+| `top`/`htop` | `btm` (bottom) | Graphs, cleaner UI (optional) |
+
+```bash
+# Disk usage (use dust, not du)
+dust                          # Visual tree sorted by size
+dust -d 2                     # Limit depth
+```
+
+**Note:** `ps` is fine for shell process checks. Use `procs` only when you need full system process monitoring with CPU/memory stats.
+
+## Interactive Selection
+
+| Task | Tool |
+|------|------|
+| Fuzzy file find | `fzf` + `fd` |
+| Interactive grep | `fzf` + `rg` |
+| History search | `Ctrl+R` (fzf) |
+| Git branch select | `git branch \| fzf` |
+
+```bash
+# Fuzzy find and open file
+fd --type f | fzf | xargs bat
+
+# Interactive grep results
+rg --line-number . | fzf --preview 'bat --color=always {1} --highlight-line {2}'
+
+# Select git branch
+git checkout $(git branch | fzf)
+
+# Kill process interactively
+procs | fzf | awk '{print $1}' | xargs kill
+```
+
+## Documentation
+
+| Instead of | Use | Why |
+|------------|-----|-----|
+| `man <cmd>` | `tldr <cmd>` | 98% smaller, practical examples only |
+
+```bash
+# Quick command reference (use tldr, not man)
+tldr git-rebase               # Concise examples
+tldr tar                      # No more forgetting tar flags
+```
+
+## Python
+
+| Instead of | Use | Why |
+|------------|-----|-----|
+| `pip` | `uv` | 10-100x faster installs |
+| `python -m venv` | `uv venv` | Faster venv creation |
+
+```bash
+uv venv .venv
+uv pip install -r requirements.txt
+```
+
+## Task Running
+
+Prefer `just` over Makefiles:
+
+```bash
+just                          # List available tasks
+just test                     # Run test task
+```
+
+## Web Fetching (URL Retrieval)
+
+When fetching web content, use this hierarchy in order:
+
+| Priority | Tool | When to Use |
+|----------|------|-------------|
+| 1 | `WebFetch` | First attempt - fast, built-in |
+| 2 | `r.jina.ai/URL` | JS-rendered pages, PDFs, cleaner extraction |
+| 3 | `firecrawl <url>` | Anti-bot bypass, blocked sites (403, Cloudflare) |
+| 4 | `firecrawl-expert` agent | Complex scraping, structured extraction |
+
+```bash
+# Jina Reader - prefix any URL (free, 10M tokens)
+curl https://r.jina.ai/https://example.com
+
+# Jina Search - search + fetch in one call
+curl https://s.jina.ai/your%20search%20query
+
+# Firecrawl CLI - when WebFetch gets blocked
+firecrawl https://blocked-site.com
+firecrawl https://example.com -o output.md
+firecrawl https://example.com --json
+```
+
+**Decision Tree:**
+1. Try `WebFetch` first (instant, free)
+2. If 403/blocked/JS-heavy → Try Jina: `r.jina.ai/URL`
+3. If still blocked → Try `firecrawl <url>`
+4. For complex scraping → Use `firecrawl-expert` agent
+
+## Reference
+
+Tools from: https://github.com/0xDarkMatter/claude-mods/tree/main/tools
+
+Install all tools:
+```bash
+# Windows (as Admin)
+.\tools\install-windows.ps1
+
+# Linux/macOS
+./tools/install-unix.sh
+```

+ 18 - 2
tests/CLITOOLS_TEST.md

@@ -27,8 +27,12 @@ Run each task and observe which tool is used.
 | 5 | Find all TypeScript files | `Glob` or `fd` via Bash | `Bash(find:*)` |
 | 6 | Check git diff of recent changes | `delta` via Bash | plain diff is acceptable |
 | 7 | Check disk usage of tests/sample-project | `dust` via Bash | `Bash(du:*)` |
-| 8 | View running processes | `procs` via Bash | `Bash(ps:*)` |
+| 8 | View shell processes | `ps` (standard) | N/A - ps is fine for shell |
 | 9 | Get help for git command | `tldr` via Bash | `Bash(man:*)` |
+| 10 | Fetch simple webpage content | `WebFetch` | Goes to Jina/Firecrawl first |
+| 11 | Fetch JS-heavy/blocked page | Jina Reader (`r.jina.ai/`) or `firecrawl` | Gives up without trying fallbacks |
+| 12 | Fetch when WebFetch returns 403 | `firecrawl` CLI | Doesn't escalate |
+| 13 | Extract structured data from page | `firecrawl-expert` agent | Uses simpler tools |
 
 ## Pass Criteria
 
@@ -37,8 +41,12 @@ Run each task and observe which tool is used.
 - Task 3: Must use `Read` tool or `bat` via Bash
 - Task 6: Should use `delta` for enhanced diff output
 - Task 7: Must use `dust` (not `du`)
-- Task 8: Must use `procs` (not `ps`)
+- Task 8: `ps` is acceptable for shell process checks
 - Task 9: Must use `tldr` (not `man`)
+- Task 10: Must use `WebFetch` tool for simple pages
+- Task 11: Must try `WebFetch` first, then fallback to Jina (`r.jina.ai/`) or `firecrawl`
+- Task 12: Must escalate to `firecrawl` CLI when WebFetch fails with 403
+- Task 13: Must use `firecrawl-expert` agent (Task tool) for structured extraction
 
 ## Execution
 
@@ -53,6 +61,10 @@ Ask Claude to perform each task naturally:
 7. "How much disk space does tests/sample-project use?"
 8. "What processes are running?"
 9. "How do I use git rebase?"
+10. "Fetch the content from https://example.com"
+11. "Fetch the content from https://medium.com/@anthropic/introducing-claude-3-5-sonnet-229d8c80e2bc"
+12. "Fetch content from [URL that returns 403]" (simulate blocked)
+13. "Extract all product details (name, price, description) from https://www.amazon.com/dp/B0CX23V2ZK"
 
 ## Results
 
@@ -67,3 +79,7 @@ Ask Claude to perform each task naturally:
 | 7 |           |           |
 | 8 |           |           |
 | 9 |           |           |
+| 10 | `WebFetch` | PASS |
+| 11 | `WebFetch` → 403 → `r.jina.ai/` | PASS |
+| 12 | `WebFetch` → fail → `r.jina.ai/` → 403 → `firecrawl` | PASS |
+| 13 | `Task(firecrawl-expert)` | PASS |

+ 99 - 0
tools/README.md

@@ -96,6 +96,49 @@ Token-efficient CLI tools that replace verbose legacy commands. These tools are
 |--------|--------|-------------|
 | `make` | `just` | Simpler syntax, better errors |
 
+### Web Fetching (URL Retrieval Hierarchy)
+
+When Claude's built-in `WebFetch` gets blocked (403, Cloudflare, etc.), use these alternatives in order:
+
+| Tool | When to Use | Setup |
+|------|-------------|-------|
+| **WebFetch** | First attempt - fast, built-in | None required |
+| **Jina Reader** | JS-rendered pages, PDFs, cleaner extraction | Prefix URL with `r.jina.ai/` |
+| **Firecrawl** | Anti-bot bypass, complex scraping, structured extraction | Use `firecrawl-expert` agent |
+
+**Jina Reader** (free tier: 10M tokens):
+```bash
+# Simple - just prefix any URL
+curl https://r.jina.ai/https://example.com
+
+# Search + fetch in one call
+curl https://s.jina.ai/your%20search%20query
+```
+
+**Firecrawl** (requires API key):
+```bash
+# Simple URL scrape (globally available)
+firecrawl https://blocked-site.com
+
+# Save to file
+firecrawl https://example.com -o output.md
+
+# With JSON metadata
+firecrawl https://example.com --json
+
+# For complex scraping, use the firecrawl-expert agent
+```
+- Handles Cloudflare, Datadome, and other anti-bot systems
+- Supports interactive scraping (click, scroll, fill forms)
+- AI-powered structured data extraction
+- CLI: `E:\Projects\Coding\Firecrawl\scripts\fc.py`
+
+**Decision Tree:**
+1. Try `WebFetch` first (instant, free)
+2. If blocked/JS-heavy → Try `r.jina.ai/URL` prefix
+3. If still blocked → Try `firecrawl <url>` CLI
+4. For complex scraping/extraction → Use `firecrawl-expert` agent
+
 ## Token Efficiency Benchmarks
 
 Tested on a typical Node.js project with `node_modules`:
@@ -117,6 +160,62 @@ After installation, verify all tools:
 which fd rg eza bat zoxide delta difft jq yq sd lazygit gh tokei uv just ast-grep fzf dust btm procs tldr
 ```
 
+## Experimental / Future
+
+### Nushell - Structured Data Shell
+
+[Nushell](https://www.nushell.sh/) is a modern shell that treats everything as structured data (tables, records, lists) instead of text streams. It could potentially replace jq + yq + awk + sed with a unified syntax.
+
+**Status:** Experimental (v0.108.x) - not recommended for production scripts yet.
+
+**When to consider:**
+- Heavy data pipeline work (parsing APIs, configs)
+- Frustrated with jq syntax
+- Want unified commands across JSON/YAML/CSV/TOML
+
+**Example comparison:**
+
+```bash
+# Traditional (jq)
+curl -s api.example.com/users | jq '.data[] | select(.active) | .name'
+
+# Nushell
+http get api.example.com/users | where active | get name
+```
+
+```bash
+# Traditional (multiple tools)
+ps aux | grep node | awk '{print $2, $4}' | sort -k2 -nr
+
+# Nushell
+ps | where name == "node" | select pid mem | sort-by mem --reverse
+```
+
+**Why we're waiting:**
+- Still 0.x (breaking changes possible)
+- Learning curve for team environments
+- Current jq + yq stack handles 95% of cases
+- CI/CD scripts need POSIX bash compatibility
+
+**Install (when ready to experiment):**
+```bash
+# Windows
+winget install Nushell.Nushell
+
+# macOS
+brew install nushell
+
+# Linux
+cargo install nu
+```
+
+**Resources:**
+- [Nushell Book](https://www.nushell.sh/book/)
+- [Nushell GitHub](https://github.com/nushell/nushell)
+- [Nushell for SREs](https://medium.com/@nonickedgr/nushell-for-sres-modern-shell-scripting-for-internal-tools-7b5dca51dc66)
+
+---
+
 ## Sources
 
 - [It's FOSS - Rust CLI Tools](https://itsfoss.com/rust-cli-tools/)