Browse Source

chore: merge net-ops from origin/main into v2.5.0

Resolves 3-way conflict between local v2.5.0 (portless-ops,
process-compose-ops, canvas removal) and remote net-ops addition.
Final state: 75 skills, 2 commands, 6 rules. Deduped fleet-ops
entry in plugin.json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0xDarkMatter 4 weeks ago
parent
commit
c88c85db50

+ 1 - 0
.claude-plugin/plugin.json

@@ -80,6 +80,7 @@
       "skills/mcp-ops",
       "skills/migrate-ops",
       "skills/monitoring-ops",
+      "skills/net-ops",
       "skills/nginx-ops",
       "skills/perf-ops",
       "skills/portless-ops",

+ 1 - 1
AGENTS.md

@@ -5,7 +5,7 @@
 This is **claude-mods** - a collection of custom extensions for Claude Code:
 - **23 expert agents** for specialized domains (React, Python, Go, Rust, AWS, git, etc.)
 - **2 commands** for session management (/sync, /save)
-- **75 skills** for CLI tools, patterns, workflows, and development tasks
+- **75 skills** for CLI tools, patterns, workflows, and development tasks (incl. `net-ops` for cross-platform network troubleshooting)
 - **13 output styles** for response personality (Vesper, Spartan, Mentor, Executive, Pair, Atlas, Coach, Harbour, Meridian, Noir, Roast, Sage, Scout)
 - **4 hooks** for pre-commit linting, post-edit formatting, dangerous command warnings, and pmail notifications
 - **Pigeon** inter-session messaging (`pigeon send/read/reply`) - SQLite-backed pmail at `~/.claude/pmail.db`

+ 1 - 0
README.md

@@ -23,6 +23,7 @@ From Python async patterns to Rust ownership models, from AWS Fargate deployment
 ## Recent Updates
 
 **v2.5.0** (May 2026)
+- 🌐 **`net-ops` skill** - Cross-platform network troubleshooting (Windows / macOS / Linux) via local or remote SSH with a layered diagnostic ladder: link → ICMP → socket → DNS infrastructure → OS resolver → app. NDP-aware IPv6 classifier (disabled / ULA-only / no-route / path-broken / healthy), MTU/PMTU test, time-skew check, browser DoH detection (Chrome / Brave / Firefox), WSL2/container awareness. Modes: `--watch`, `--json` (NDJSON), `--redact` for opsec-clean dumps, `--quick` for skip-if-healthy. Per-OS probe + dns-audit + repair scripts, reverse-mode probe, 24-test self-suite.
 - 🌐 **`portless-ops` skill** - Local-dev HTTPS proxy operations for Vercel Labs' [portless](https://github.com/vercel-labs/portless). Wraps the canonical upstream `SKILL.md` and `oauth/SKILL.md` (vendored verbatim into `references/` since the npm package only ships `dist/`) and overlays operational patterns we've validated: the static-alias pattern for pairing portless with external supervisors (Process Compose, PM2, Docker), TLD selection decision tree (`.test`/`.dev`/`.localhost`/custom-owned), Windows-specific gotchas (`openssl` PATH from Git for Windows, `certutil` quirks, curl-vs-browser cert handling, PS 5.1 vs 7+ flag differences), the clean-reset procedure when changing TLDs (because `portless alias --remove` appends the active TLD), and three runnable scripts: `install-portless.ps1` (audits the npm tarball for known supply-chain IOCs *before* installing), `reset-state.ps1` (full state wipe + re-register), `sync-aliases-from-yaml.ps1` (derives portless aliases from a supervisor's YAML). Four `portless.json` asset templates cover single-app, monorepo, custom-TLD-documented, and `package.json`-inline patterns.
 - 🎛️ **`process-compose-ops` skill** - Comprehensive operations for [Process Compose](https://github.com/F1bonacc1/process-compose), the Go-binary supervisor replacing PM2/supervisord/Foreman for non-containerised local services. Six reference files: `schema-reference.md` (full YAML schema with field semantics, defaults, and command-quoting gotchas including Windows-PATH backslash handling), `probe-patterns.md` (readiness probe recipes per stack — Python/Go/Node/TCP-only/daemons), `dependency-patterns.md` (`depends_on` patterns: companion daemons, DB-before-app, tunnel-after-service, one-shot init), `tui-shortcuts.md` (TUI keybindings cheatsheet, status legend, search/sort), `boot-persistence-windows.md` (Task Scheduler with `S4U` logon and PATH-aware wrapper script), `supply-chain-verification.md` (SHA-256 verification procedure for the binary). Four runnable scripts: `install-process-compose.ps1` (verified download + extract + writes `VERIFICATION.md`), `verify-binary.ps1` (re-verifies committed binary hash), plus boot wrapper and Task Scheduler installer templates. Five YAML assets: Python service, Django+companions, Go binary, Cloudflare tunnel pattern, cron job. Material derived from a 3-hour production migration from PM2+Caddy+Dagu to Process Compose+portless, anonymised for general use.
 - 📦 **Plugin manifest catch-up** - `summon` (v2.4.11) and `fleet-ops` (post-v2.4.11) were committed and listed in README but never added to `.claude-plugin/plugin.json`'s `components.skills` array, so they weren't being indexed by the plugin system. Both registered correctly now alongside the new pair.

+ 8 - 2
scripts/install.sh

@@ -133,9 +133,15 @@ for skill_dir in "$PROJECT_ROOT/skills"/*/; do
     [ -d "$skill_dir" ] || continue
     skill_name=$(basename "$skill_dir")
 
-    # Remove existing and copy fresh
+    # Skip _lib (shared library used by multiple skills, not a skill itself)
+    [ "$skill_name" = "_lib" ] && continue
+
+    # Remove existing and copy fresh. Strip trailing slash from $skill_dir
+    # so cp creates a subdirectory rather than merging contents (the *.*/* glob
+    # always returns paths with trailing slashes, which makes cp behave as if
+    # asked to copy contents — that's a long-standing bug we just fixed).
     rm -rf "$CLAUDE_DIR/skills/$skill_name"
-    cp -r "$skill_dir" "$CLAUDE_DIR/skills/"
+    cp -r "${skill_dir%/}" "$CLAUDE_DIR/skills/"
     echo -e "  ${GREEN}$skill_name/${NC}"
 done
 echo ""

+ 152 - 0
skills/net-ops/SKILL.md

@@ -0,0 +1,152 @@
+---
+name: net-ops
+description: "Cross-platform network troubleshooting (Windows, macOS, Linux) via local or remote shell. Use for: DNS broken, can't resolve hostnames, nslookup/dig works but apps fail, NRPT, WFP, scutil, /etc/resolver, systemd-resolved, /etc/resolv.conf, NetworkManager, VPN DNS leak residue (ProtonVPN/Mullvad/WireGuard/AnyConnect), AV/firewall blocking DNS or DoH, Tailscale DNS interaction, intermittent connectivity, remote diagnostics over SSH."
+license: MIT
+allowed-tools: "Read Write Bash"
+metadata:
+  author: claude-mods
+  related-skills: debug-ops, network-tools
+---
+
+# Network Operations
+
+Diagnose network problems on Windows, macOS, or Linux with a layered ladder that isolates faults to the smallest possible scope, then pattern-match against OS-specific culprits. Designed for the common case: someone reports "internet broken" on a box you can shell into (locally or via SSH).
+
+## The Universal Insight
+
+**Bypass-tool succeeds while OS-resolver fails is a smoking gun on every platform.** It means DNS infrastructure is healthy but the operating system's name-resolution path is hooked or misconfigured. The bypass tool differs per OS but the discriminator is identical:
+
+| OS | Bypass tool | OS resolver tool | If bypass works but resolver fails |
+|---|---|---|---|
+| Windows | `nslookup` | `Resolve-DnsName`, browsers | NRPT, WFP, HOSTS, LSP, local 127.0.0.1:53 proxy |
+| macOS | `dig @1.1.1.1` | `dscacheutil -q host`, browsers | `/etc/resolver/*`, scutil DNS, profiles, mDNSResponder, kext |
+| Linux | `dig @1.1.1.1` | `getent hosts`, `resolvectl query` | systemd-resolved, `/etc/resolv.conf`, NetworkManager, dnsmasq, NSS |
+
+The bypass tool implements its own resolver and talks straight to UDP/53. The OS resolver tool goes through the full system name-service path including all hooks. Comparing the two narrows the suspect list dramatically.
+
+## The Diagnostic Ladder
+
+Walk down the layers in order. **Do not skip rungs.** Each rung has a binary outcome that eliminates everything above it. Per-OS tools are in `references/diagnostic-ladder.md`; the structure is universal.
+
+```
+1. Link layer        — interface up, valid IP, gateway present
+2. IP reachability   — ping public IPs over ICMP
+3. Socket reach.     — TCP/443 + UDP/53 to known destinations (raw socket DNS)
+4. DNS infrastructure — bypass tool: nslookup / dig @<server>
+5. OS resolver path  — the hook layer (most interesting on modern systems)
+6. Application       — real HTTP request to a real hostname
+```
+
+The most common mistake: jumping to rung 6 ("HTTPS doesn't work, must be a cert / proxy") when rung 5 is the actual problem (an orphan VPN DNS rule on Windows, a stale `/etc/resolver/` file on macOS, a misconfigured systemd-resolved on Linux). Discipline prevents this.
+
+## Workflow
+
+### 1. Identify the target OS
+
+If local: `uname -s` (Unix) or check shell environment. If remote over SSH, the bootstrap script auto-detects:
+
+```bash
+scripts/ssh-bootstrap.sh <user>@<host>
+```
+
+### 2. Run the OS-appropriate probe
+
+| OS | Script |
+|---|---|
+| Windows | `scripts/windows/probe.ps1` (via `-EncodedCommand` over SSH) |
+| macOS | `scripts/macos/probe.sh` |
+| Linux | `scripts/linux/probe.sh` |
+
+Each prints structured `[PASS]/[FAIL]` per rung. Scan for the first FAIL — that's where to drill in.
+
+### 3. Drill into the failing layer
+
+The interesting failures are almost always rung 5. Per-OS deep-dive scripts:
+
+| OS | Script | What it does |
+|---|---|---|
+| Windows | `scripts/windows/nrpt-audit.ps1` | Dump NRPT rules with attribution + registry forensics |
+| macOS | `scripts/macos/dns-audit.sh` | Dump scutil --dns, /etc/resolver/*, mDNSResponder state, profiles |
+| Linux | `scripts/linux/dns-audit.sh` | Dump systemd-resolved status, resolv.conf chain, NM config, NSS order |
+
+### 4. Apply the minimum reversible fix
+
+Repair scripts default to **dry-run** and protect known-good config (Tailscale MagicDNS, MDM-managed entries). Apply only when the dry-run output matches expectation.
+
+| OS | Repair script |
+|---|---|
+| Windows | `scripts/windows/nrpt-clean.ps1` (removes orphan NRPT catch-alls, protects Tailscale) |
+| macOS | `scripts/macos/resolver-clean.sh` (removes orphan `/etc/resolver/*` from disconnected VPNs) |
+| Linux | `scripts/linux/resolved-reset.sh` (resets systemd-resolved per-link config) |
+
+## Quick Reference: Smoking Guns
+
+| Platform | Symptom | Most likely cause | Quick test |
+|---|---|---|---|
+| Windows | `nslookup` works, browsers fail | Orphan NRPT catch-all (VPN residue) | `Get-DnsClientNrptRule \| Where Namespace -eq '.'` |
+| Windows | Public DoH resolver IPs blocked on 443, other 443 works | AV "Encrypted DNS Detection" | `Get-CimInstance -Ns root/SecurityCenter2 -Class AntiVirusProduct` |
+| macOS | `dig` works, browsers fail | Stale `/etc/resolver/*` from disconnected VPN | `ls /etc/resolver/ && scutil --dns \| head -40` |
+| macOS | All DNS fails post-VPN install | Configuration profile with DNS override | `profiles list -type configuration` |
+| Linux | `dig` works, `getent hosts` fails | systemd-resolved misconfigured | `resolvectl status` |
+| Linux | DNS works on some apps, not others | NSS order in `/etc/nsswitch.conf` excludes `resolve` | `grep ^hosts /etc/nsswitch.conf` |
+| All | DNS suddenly broken after sleep/wake | VPN client failed disconnect cleanup | OS-specific (see above) |
+
+## SSH Transport Patterns
+
+### Windows targets
+
+PowerShell-over-SSH has notorious escaping issues. Always pass scripts via `-EncodedCommand` with UTF-16LE base64:
+
+```bash
+B64=$(printf '%s' "$PS_SCRIPT" | iconv -t UTF-16LE | base64)
+ssh <target> "powershell -NoProfile -EncodedCommand $B64"
+```
+
+### Unix targets (macOS, Linux)
+
+Heredoc works cleanly; no special encoding needed:
+
+```bash
+ssh <target> 'bash -s' < scripts/linux/probe.sh
+# or, with arguments:
+ssh <target> "bash -s -- arg1 arg2" < scripts/linux/probe.sh
+```
+
+For consistency, `scripts/ssh-bootstrap.sh` handles both transports based on detected OS.
+
+## Pattern Recognition
+
+After a few sessions, certain symptom triplets become instantly diagnosable. See `references/case-studies.md` for worked examples. Hall-of-fame entries:
+
+**Windows:** `nslookup` works, `Resolve-DnsName` times out identically across all servers, `Invoke-WebRequest` says "remote name could not be resolved" → orphan NRPT catch-all from a disconnected VPN. Common gateway IP patterns are listed in `references/common-culprits.md`.
+
+**macOS:** `dig <host>` works, browsers say "cannot find server," `scutil --dns` shows extra "resolver #N" entries pointing at private-range gateways with `domain :` listed → leftover `/etc/resolver/<domain>` files from a disconnected VPN.
+
+**Linux:** `dig @<public-resolver> <host>` works, `getent hosts <host>` fails → `/etc/nsswitch.conf` may have an NSS chain that skips `resolve`, OR `/etc/resolv.conf` is no longer symlinked to the systemd-resolved stub.
+
+## Safety Notes
+
+- **Read before write.** Always dump current state before modifying a resolver config. The forensics may be load-bearing for explaining what happened.
+- **Don't disable security tools without consent.** AV / firewall hooks are intrusive but legitimate. Pause is preferred over uninstall.
+- **Tailscale's name-resolution config looks like junk but is essential.** Always filter on protected nameserver patterns (`100.100.100.100` on all OSes) before bulk-deleting.
+- **Resolver config persists across reboots.** Removing a rule is forever (until the VPN re-creates it). Confirm the source/comment before deletion.
+- **macOS profile DNS overrides may be MDM-managed.** Removing them may violate enterprise policy and may be re-applied automatically. Coordinate with IT.
+
+## References
+
+- `references/diagnostic-ladder.md` — full ladder methodology with per-OS commands per rung
+- `references/common-culprits.md` — detection + fix catalog for Windows / macOS / Linux
+- `references/case-studies.md` — worked examples and template for adding new ones
+
+## Scripts
+
+- `scripts/ssh-bootstrap.sh` — establish SSH session, auto-detect target OS, emit usable invocation
+- `scripts/windows/probe.ps1` — full layered diagnostic for Windows
+- `scripts/windows/nrpt-audit.ps1` — NRPT forensics with attribution
+- `scripts/windows/nrpt-clean.ps1` — safe NRPT cleanup (protects Tailscale)
+- `scripts/macos/probe.sh` — full layered diagnostic for macOS
+- `scripts/macos/dns-audit.sh` — scutil + /etc/resolver + profile + mDNSResponder dump
+- `scripts/macos/resolver-clean.sh` — remove orphan /etc/resolver/* files
+- `scripts/linux/probe.sh` — full layered diagnostic for Linux
+- `scripts/linux/dns-audit.sh` — systemd-resolved + NM + NSS + resolv.conf dump
+- `scripts/linux/resolved-reset.sh` — reset systemd-resolved per-link state

+ 0 - 0
skills/net-ops/assets/.gitkeep


+ 90 - 0
skills/net-ops/references/case-studies.md

@@ -0,0 +1,90 @@
+# Case Studies
+
+Worked examples of network diagnostics that motivated this skill. Each case includes the initial symptoms, the diagnostic path, the dead ends, and the final cause. Identifying details are scrubbed; technical details that reproduce the diagnostic value are preserved.
+
+## Case 1: The Proton VPN Ghost (Windows)
+
+### Initial Report
+
+> "Internet not working on my Windows desktop. It wasn't working on wifi earlier today so I switched to ethernet — but problems persisted."
+
+That last sentence is a load-bearing clue: switching physical interface didn't help. That rules out the NIC, driver, cable, and wifi association in a single observation. Whatever is broken lives at the OS layer or above.
+
+### Diagnostic Path
+
+**Rung 1 (link):** Ethernet `Up`, valid private IP, valid default gateway. ✓
+
+**Rung 2 (ICMP):** Ping `1.1.1.1`, `8.8.8.8`, gateway — all <5ms. ✓
+
+**Rung 3 (sockets):** First test misread — `Resolve-DnsName -Server 1.1.1.1` timed out, which felt like UDP/53 was blocked. **Mistake.** Should have gone straight to raw UDP to disambiguate. When raw UDP/53 was eventually tested, it returned a 124-byte DNS response in milliseconds. Lesson: `Resolve-DnsName` uses the Windows DNS Client API even when `-Server` is specified — it's not a clean probe of the network.
+
+**Rung 3 (sockets, second pass):**
+- TCP/53 to 1.1.1.1 → works
+- Raw UDP/53 to 1.1.1.1 → works (124-byte reply)
+- TCP/443 to 1.1.1.1 → **fails**
+- TCP/443 to 8.8.8.8 → **fails**
+- TCP/443 to 140.82.114.4 (github.com) → works
+- TCP/443 to 13.107.42.14 (microsoft.com) → works
+
+**Discriminator:** Destination-specific HTTPS block. Known public DoH resolver IPs are firewalled on 443; everything else works. **Smell of AV "Encrypted DNS Detection."** Confirmed by `Get-CimInstance -Namespace root/SecurityCenter2`: ESET Security + ESET Firewall both active, and `epfwwfp` WFP callout driver loaded.
+
+This was filed as a **secondary concern** — not the cause of the main symptom (general DNS failure for browsers). Important not to chase the first interesting finding when it doesn't match the headline symptom.
+
+**Rung 4 (nslookup):** `nslookup google.com` against router, 1.1.1.1, and 8.8.8.8 — all returned addresses immediately. ✓
+
+**Rung 5 (DNS Client API):** `Resolve-DnsName google.com -Type A` → timeout. `Invoke-WebRequest https://www.google.com` → "The remote name could not be resolved."
+
+**The smoking gun.** Rung 4 passed perfectly; rung 5 failed identically across all targets. Everything app-level fails because every app uses the DNS Client API. nslookup works because it has its own resolver.
+
+### The False Lead
+
+First suspicion: ICS. Port 53 was held by `svchost` PID `3928`, which turned out to be the `SharedAccess` service. Stopped it; the service bounced back on a new PID, and DNS resolution did not recover. ICS was a red herring — it was running but its sharing configuration was empty, meaning it wasn't actually doing anything harmful. Lesson: **don't disable a service just because it looks suspicious; verify it's actually causing the symptom first.**
+
+### The Second False Lead
+
+Next suspicion: ESET's WFP driver. The driver was present and active, and the destination-specific HTTPS block looked like classic AV protocol filtering. But: AV protocol filtering normally affects HTTPS, not DNS Client API calls. Before pausing ESET, ran `Get-DnsClientNrptRule`.
+
+### The Answer
+
+```
+Namespace                         NameServers
+---------                         -----------
+.                                 10.2.0.1
+```
+
+A catch-all NRPT rule routing every DNS query to `10.2.0.1`. The rule's `Comment` field: **"Force all DNS requests via Proton VPN"** — verbatim from Proton's source code. The IP `10.2.0.1` is Proton's in-tunnel DNS gateway, only reachable while connected to their VPN.
+
+Removed the single rule. Flushed DNS cache. Re-tested:
+- `Resolve-DnsName google.com` → instant success, returned A records
+- `Invoke-WebRequest https://www.google.com` → HTTP 200, full page body
+
+### Forensics
+
+Checked `C:\Program Files\Proton\VPN\Install.log.txt`: Proton VPN installation confirmed (current Inno Setup log entry showed the latest installed version). Service binaries present (`ProtonVPNService.exe`, `ProtonVPN.WireGuardService.exe`), all in `Stopped` state at time of diagnosis. The last active VPN session timestamp (per `ServiceData\WireGuard\log.bin`) predated the issue report by several days — DNS had been silently broken since the last disconnect, masked by occasional cache hits and apps that handle DNS failure gracefully.
+
+**Likely trigger:** Sleep or hibernate during an active Proton WireGuard session. Proton's disconnect cleanup hook didn't fire, and the NRPT rule outlived the tunnel.
+
+### Lessons
+
+1. **Always run `Get-DnsClientNrptRule` before suspecting WFP/AV.** It's a one-line check that resolves 90% of "DNS infrastructure works but apps fail" cases.
+2. **Don't conflate `Resolve-DnsName` with a network probe.** It uses the system DNS Client API and inherits every hook in the path. Use raw UDP for actual network-layer DNS testing.
+3. **Multiple anomalies don't mean multiple bugs.** ESET's DoH IP block was a real and separate finding, but it wasn't the cause of the headline symptom. Stay focused on what matches the user's actual complaint.
+4. **The `Comment` field on NRPT rules is gold.** VPN clients tend to write self-identifying strings. Read them before assuming malice.
+5. **Interface-switch ineffective = OS-layer cause.** When wifi → ethernet doesn't fix it, the diagnostic search space contracts dramatically.
+
+## Case 2: Template for Future Entries
+
+When you diagnose a new case worth remembering, add a section here with:
+- Initial report (verbatim if possible)
+- Diagnostic path (rung-by-rung)
+- False leads (the ones you chased before finding the real cause — these are the educational part)
+- The actual cause
+- Forensics (how/when/why it got into that state)
+- Lessons (1-3 reusable observations)
+
+Cases worth adding:
+- A Mullvad-residue case (different IP, otherwise structurally identical to Proton)
+- A corporate AnyConnect leak case
+- A genuine ESET "Encrypted DNS Detection" case where pausing AV was the fix
+- An IPv6-preference-with-broken-v6 slowness case
+- A Winsock LSP corruption case

+ 312 - 0
skills/net-ops/references/common-culprits.md

@@ -0,0 +1,312 @@
+# Common Culprits Catalog
+
+Field guide to known causes of network weirdness on Windows, macOS, and Linux. Ordered within each OS by frequency in observed cases. Each entry: detection command, signature in probe output, and the safe fix.
+
+---
+
+# WINDOWS
+
+## W1. Orphaned NRPT Catch-All (VPN residue)
+
+**Frequency:** Very common — most likely cause of "DNS works in nslookup but not browsers."
+
+**Mechanism:** VPN clients (Proton, Mullvad, Cisco AnyConnect, NordVPN, DirectAccess) set an NRPT rule with `Namespace = "."` pointing at their in-tunnel DNS gateway. Buggy disconnect cleanup → rule outlives the tunnel → every DNS query goes into a void.
+
+**Detection:** `Get-DnsClientNrptRule | Where Namespace -eq '.'`
+
+**Telltale IPs:**
+- `10.2.0.x` → Proton VPN
+- `10.64.0.x` → Mullvad
+- `10.211.x.x` → Cisco AnyConnect (varies by enterprise)
+- `10.5.0.x` → NordVPN
+
+**Fix:** `scripts/windows/nrpt-clean.ps1 -Apply` (preserves Tailscale rules).
+
+## W2. AV WFP Hooks (ESET / Kaspersky / Bitdefender / Norton)
+
+**Frequency:** Common on machines with full security suites.
+
+**Mechanism:** AV products install Windows Filtering Platform callout drivers. Features like "Encrypted DNS Detection," "SSL/TLS Protocol Filtering," "Web Access Protection" can block public DoH resolver IPs on 443 and intercept DNS Client API calls.
+
+**Detection:** `Get-CimInstance -Namespace root/SecurityCenter2 -ClassName AntiVirusProduct` and `Get-CimInstance Win32_SystemDriver | Where Name -match 'epfwwfp|wfpcap|symefa|mfewfpk|bdfwfpf'`
+
+**Signature in probe:** `TCP/443 -> 1.1.1.1` FAIL but `TCP/443 -> 140.82.114.4` (github.com) PASS.
+
+**Fix:** Pause AV via tray → re-probe → if confirmed, disable "Encrypted DNS Detection" in AV settings or add the public resolver IPs to allowed addresses.
+
+## W3. Internet Connection Sharing (ICS) Stuck
+
+**Frequency:** Occasional, often a side effect of Mobile Hotspot or Hyper-V.
+
+**Mechanism:** `SharedAccess` service binds DNS proxy on `0.0.0.0:53`. Usually a red herring — its presence doesn't cause failures alone, but it can mask the underlying culprit.
+
+**Detection:** `Get-Service SharedAccess` + `Get-NetUDPEndpoint -LocalPort 53`
+
+**Fix (if confirmed unwanted):** `Set-Service SharedAccess -StartupType Disabled; Stop-Service SharedAccess -Force`
+
+## W4. Local 127.0.0.1:53 Proxy (NextDNS / AdGuard / Pi-hole client / Cloudflare WARP)
+
+**Frequency:** Increasing as DoH-via-proxy adoption grows.
+
+**Detection:** `Get-NetUDPEndpoint -LocalPort 53 | Where LocalAddress -eq '127.0.0.1'`
+
+**Fix:** Restart the proxy service, or temporarily reconfigure DNS to a public resolver and verify:
+```powershell
+Set-DnsClientServerAddress -InterfaceAlias Ethernet -ServerAddresses 1.1.1.1,8.8.8.8
+```
+
+## W5. Consumer Router DoH IP Blocking (also affects macOS, Linux)
+
+**Frequency:** Growing rapidly. Most prosumer routers from 2023+ ship with this enabled by default.
+
+**Mechanism:** Consumer routers with "parental controls" / "safe browsing" / "advanced threat protection" features maintain blocklists of known public DoH (DNS-over-HTTPS) resolver IPs and silently drop TCP/443 to them. Goal: prevent DoH from bypassing the router's DNS filtering. Affects:
+
+- **Asus AiProtection / Trend Micro filtering**
+- **TP-Link HomeShield / HomeCare**
+- **Eero Secure / Secure+**
+- **Netgear Armor**
+- **Synology Safe Access**
+- **OPNsense/pfSense with custom blocklists**
+- **Pi-hole upstream config blocking DoH**
+
+**Detection (cross-platform — this is a network-level block, not OS-specific):**
+- `TCP/443 -> 1.1.1.1` FAIL **and** `TCP/443 -> 8.8.8.8` FAIL **and** `TCP/443 -> 9.9.9.9` FAIL
+- `TCP/443 -> <github.com IP>` PASS (control: any non-DoH 443 destination)
+- Failure pattern is **identical across multiple devices on the same LAN**
+
+Confirmed via this skill's dogfooding: a single Asus AiProtection-enabled LAN blocked 1.1.1.1:443 and 8.8.8.8:443 from both a Windows desktop and a macOS laptop, while github.com:443 worked from both. The discriminator (different destinations, same port) immediately localized the block to the router/LAN rather than per-device AV.
+
+**Fix:**
+1. Router admin UI → disable parental controls / safe browsing / threat protection for the affected client, OR for the entire LAN
+2. If you need DoH but can't change router config: use Cloudflare's `1.1.1.1` on port 853 (DoT, often unblocked) or via WARP client which uses non-standard ports
+3. Many routers allow per-device exemption — exclude the diagnostic machine if you don't want to disable network-wide
+
+**Important:** This is not a malicious block. It's a working-as-intended security feature, often beneficial. Only override if you have a specific reason to bypass it (e.g., legitimate use of DoH for privacy).
+
+## W6. HOSTS File Pollution / Winsock LSP Corruption
+
+**Frequency:** Rare but quick to check / nuclear to fix.
+
+**Detection:** `Get-Content $env:windir\System32\drivers\etc\hosts` + `netsh winsock show catalog`
+
+**Nuclear fix (requires reboot):** `netsh winsock reset; netsh int ip reset`
+
+---
+
+# macOS
+
+## M1. Orphan `/etc/resolver/<domain>` Files (VPN residue)
+
+**Frequency:** Very common — macOS equivalent of the Windows NRPT bug.
+
+**Mechanism:** Some VPN clients (especially Cisco AnyConnect / Secure Client, Proton VPN, occasional Mullvad) write per-domain resolver files to `/etc/resolver/`. Each file points a specific DNS suffix at the VPN gateway. On disconnect, cleanup is supposed to remove them. It often doesn't.
+
+**Detection:**
+```bash
+ls /etc/resolver/
+scutil --dns | head -40
+```
+
+**Telltale IPs in `/etc/resolver/<file>`:**
+- `10.2.0.x` → Proton VPN
+- `10.64.0.x` → Mullvad
+- `10.211.x.x` → Cisco AnyConnect
+- `127.0.0.1` → local DNS proxy (NextDNS, AdGuard)
+
+**Signature:** `dig @1.1.1.1 google.com` works but `dscacheutil -q host -a name google.com` fails OR returns wrong addresses. `scutil --dns` shows extra "resolver #N" entries with `domain :` lines naming corporate or VPN-specific zones.
+
+**Fix:** `scripts/macos/resolver-clean.sh --apply` (protects Tailscale's MagicDNS).
+
+## M2. Configuration Profile DNS Override (MDM-installed)
+
+**Frequency:** Common on managed Macs.
+
+**Mechanism:** MDM (Jamf, Intune, Kandji, Mosyle) can push DNS configuration via a `.mobileconfig` profile that overrides the resolver chain. If the profile points at an internal corporate DNS that's unreachable from your current network, all resolution dies.
+
+**Detection:**
+```bash
+profiles list -type configuration
+sudo profiles show -type configuration   # full payloads
+```
+
+**Fix:** Coordinate with IT — removing an MDM-managed profile may violate policy and may be re-applied automatically. The fix usually involves connecting to corporate VPN to make the internal DNS reachable, or asking IT to amend the profile.
+
+## M3. mDNSResponder Crashed / Hung
+
+**Frequency:** Rare but catastrophic when it happens.
+
+**Detection:** `pgrep -x mDNSResponder`
+
+**Fix:**
+```bash
+sudo killall -HUP mDNSResponder     # gentle nudge
+sudo killall mDNSResponder          # force restart (launchd auto-respawns)
+sudo dscacheutil -flushcache
+```
+
+## M4. Stale `scutil --dns` State After Network Change
+
+**Frequency:** Occasional, especially after sleep/wake or switching wifi networks.
+
+**Mechanism:** macOS's resolver cache can hang on to settings from a previous network. New connection has fresh DNS servers but the resolver chain still has entries from the previous network.
+
+**Detection:** `scutil --dns` shows multiple resolvers with different IPs that don't match the current network's actual DNS.
+
+**Fix:**
+```bash
+sudo killall -HUP mDNSResponder
+sudo dscacheutil -flushcache
+```
+
+If persistent: cycle the active network interface in System Settings → Network → Details → Renew DHCP Lease.
+
+## M5. Third-Party Network Kext / System Extension
+
+**Frequency:** Decreasing (Apple has deprecated kexts in favor of system extensions).
+
+**Mechanism:** Cisco AnyConnect, Little Snitch, Lulu, some legacy AV products. Can hook the network stack at kernel level.
+
+**Detection:** `kextstat | grep -iE 'cisco|anyconnect|proton|mullvad|nord|littlesnitch|lulu'`
+
+**Fix:** Disable via the app's GUI, not by force-unloading the kext.
+
+## M6. PAC File / Proxy Set System-Wide
+
+**Frequency:** Common in corporate environments.
+
+**Detection:** `scutil --proxy`
+
+**Fix:** System Settings → Network → Details → Proxies → toggle off the relevant proxy (or check that the PAC URL is reachable).
+
+---
+
+# LINUX
+
+## L1. `/etc/resolv.conf` No Longer Symlinked to systemd-resolved Stub
+
+**Frequency:** Common — happens when VPN clients or DHCP scripts overwrite the symlink.
+
+**Mechanism:** systemd-resolved expects `/etc/resolv.conf` to be a symlink to `/run/systemd/resolve/stub-resolv.conf`. If a VPN script (or an Old-School Sysadmin) replaces it with a plain file containing static nameservers, the file becomes a stale snapshot — apps using libc's resolver hit the static file while `resolvectl` operates independently.
+
+**Detection:**
+```bash
+readlink /etc/resolv.conf
+# Expected on systemd-resolved hosts:
+# /run/systemd/resolve/stub-resolv.conf
+# Or:
+# ../run/systemd/resolve/stub-resolv.conf
+```
+
+**Fix:**
+```bash
+sudo ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
+sudo systemctl restart systemd-resolved
+```
+
+## L2. systemd-resolved Per-Link DNS Stuck After VPN Disconnect
+
+**Frequency:** Very common — Linux equivalent of the Windows NRPT bug.
+
+**Mechanism:** VPN clients (OpenVPN, WireGuard, Mullvad, Proton CLI) push per-link DNS via `resolvectl dns <iface> <servers>` when connecting. Cleanup on disconnect should revert the link's DNS, but many scripts forget. Result: queries route to a dead per-link DNS server.
+
+**Detection:** `resolvectl status` shows DNS servers configured on a VPN interface that's no longer routing, OR a global fallback that no longer applies.
+
+**Fix:** `scripts/linux/resolved-reset.sh --apply`
+
+## L3. `/etc/nsswitch.conf` hosts Line Excludes Resolver
+
+**Frequency:** Rare but devastating.
+
+**Mechanism:** If `/etc/nsswitch.conf` has `hosts: files dns` on a systemd-resolved system, glibc bypasses `resolve` (the systemd-resolved NSS module) and goes straight to whatever `/etc/resolv.conf` says. If that's broken, all libc-based name resolution fails — even though `resolvectl query` may still work.
+
+**Detection:**
+```bash
+grep "^hosts:" /etc/nsswitch.conf
+# Healthy on systemd-resolved system:
+# hosts: files mymachines resolve [!UNAVAIL=return] dns myhostname
+```
+
+**Fix:** Restore the canonical line per your distro's defaults. On Ubuntu/Debian:
+```bash
+sudo sed -i 's/^hosts:.*/hosts: files mymachines resolve [!UNAVAIL=return] dns myhostname/' /etc/nsswitch.conf
+```
+
+## L4. NetworkManager `dns=` Mode Conflicts With systemd-resolved
+
+**Frequency:** Occasional on desktop Linux.
+
+**Mechanism:** NetworkManager has its own opinions about DNS. Settings include `none` (NM doesn't touch DNS), `dnsmasq` (NM starts a local dnsmasq), `systemd-resolved` (NM hands off to systemd-resolved). Mismatch between NM's mode and what's actually running creates a fight.
+
+**Detection:**
+```bash
+awk '/\[main\]/,/\[/{if(/^dns/)print}' /etc/NetworkManager/NetworkManager.conf
+ls /etc/NetworkManager/conf.d/
+```
+
+**Fix:** Pick one strategy and stick with it. The modern recommended setup is `dns=systemd-resolved` on systemd distros.
+
+## L5. dnsmasq Local Instance Bound to 127.0.0.1:53
+
+**Frequency:** Occasional, especially with old NetworkManager configs or libvirt installs.
+
+**Mechanism:** A local dnsmasq listens on 127.0.0.1:53 and `/etc/resolv.conf` points at 127.0.0.1. If dnsmasq's upstream config is broken or stale, all DNS fails despite the infrastructure being fine.
+
+**Detection:** `ss -tulnp | grep ':53'`
+
+**Fix:** Check dnsmasq's actual upstream config (`/etc/dnsmasq.d/*`, `/etc/NetworkManager/dnsmasq.d/*`) and restart: `sudo systemctl restart dnsmasq` or `sudo systemctl restart NetworkManager`.
+
+## L6. WireGuard / OpenVPN PostUp DNS Hook Failure
+
+**Frequency:** Common with hand-rolled VPN configs.
+
+**Mechanism:** WireGuard configs often have `PostUp = resolvectl dns %i 10.0.0.1` and `PostDown = resolvectl revert %i`. If `wg-quick down` is killed before `PostDown` runs (sleep, SIGKILL, crash), the DNS state is never reverted.
+
+**Detection:** `resolvectl status` shows DNS on a `wg*` interface that no longer exists, or `ip link` shows no `wg*` interface but `resolvectl` still has DNS configured for one.
+
+**Fix:** `scripts/linux/resolved-reset.sh --apply` cleans most of this. For lingering interface entries: `sudo resolvectl revert <ifname>`.
+
+## L7. Container / WSL2 Special Cases
+
+**Frequency:** Occasional.
+
+**Mechanism:**
+- **Docker containers** inherit DNS from the host. If host DNS is broken, containers inherit the breakage. Containers using `--network=host` follow host config exactly.
+- **WSL2** has its own resolver chain. `/etc/resolv.conf` inside WSL2 is auto-generated by `wsl.conf`. Windows-side DNS hooks (NRPT, AV) don't affect WSL2 unless `wsl.conf` is configured to share them.
+
+**Detection:**
+- Docker: `docker exec <container> cat /etc/resolv.conf`
+- WSL2: `cat /etc/resolv.conf` inside WSL + `cat /etc/wsl.conf`
+
+**Fix:**
+- Docker: fix host DNS first, then `docker restart`
+- WSL2: configure `/etc/wsl.conf` with `[network]\ngenerateResolvConf = false` and write a custom `/etc/resolv.conf`
+
+---
+
+## Cross-OS Process-of-Elimination Summary
+
+```
+Apps fail, bypass tool (nslookup / dig) works
+    ↓
+Check OS-specific catch-all DNS hook:
+    Windows → Get-DnsClientNrptRule | Where Namespace -eq '.'
+    macOS   → ls /etc/resolver/ + scutil --dns
+    Linux   → resolvectl status (per-link DNS) + readlink /etc/resolv.conf
+    ↓ clean
+Check HOSTS / nsswitch:
+    Windows → C:\Windows\System32\drivers\etc\hosts
+    macOS   → /etc/hosts
+    Linux   → /etc/hosts + /etc/nsswitch.conf hosts line
+    ↓ clean
+Check local 127.0.0.x:53 listener (DNS proxy):
+    Windows → Get-NetUDPEndpoint -LocalPort 53
+    macOS   → lsof -i UDP:53
+    Linux   → ss -tulnp | grep :53
+    ↓ clean
+Check security software / kernel hooks:
+    Windows → WFP drivers (epfwwfp et al.)
+    macOS   → kextstat / system extensions
+    Linux   → iptables -L OUTPUT / nft list ruleset
+    ↓ clean
+Welcome to the long tail — start reading per-OS resolver logs.
+```

+ 138 - 0
skills/net-ops/references/diagnostic-ladder.md

@@ -0,0 +1,138 @@
+# The Diagnostic Ladder
+
+A layered methodology for isolating network faults from the wire up, applicable to Windows, macOS, and Linux. Each rung has a binary outcome that eliminates everything above it — walk in order, do not skip.
+
+## Why Layered Probing Beats Pattern Matching
+
+When a user says "internet is broken," there are roughly 30 plausible causes spanning seven OSI-ish layers. Guessing wastes time. The ladder is a binary-search through the stack: each test eliminates roughly half the remaining suspects.
+
+The most common mistake is jumping straight to layer 6 ("HTTPS doesn't work, must be a cert / proxy / SNI thing") when the real issue is layer 5 (the OS resolver is being hijacked by an orphaned VPN config from a tunnel that hasn't been connected in four days). Discipline prevents this.
+
+## Per-OS Tool Reference
+
+| Rung | Windows | macOS | Linux |
+|---|---|---|---|
+| 1. Link | `Get-NetAdapter` / `Get-NetIPConfiguration` | `ifconfig` / `networksetup -listallhardwareports` | `ip -br link` / `ip -br addr` |
+| 2. ICMP | `Test-Connection 1.1.1.1` | `ping -c 2 1.1.1.1` | `ping -c 2 1.1.1.1` |
+| 3. TCP/UDP | `Test-NetConnection -Port 443` + raw UDP via .NET | `nc -zv` + `dig @<ip>` | `bash </dev/tcp/<ip>/443` + `dig @<ip>` |
+| 4. DNS infra | `nslookup google.com 1.1.1.1` | `dig @1.1.1.1 google.com` | `dig @1.1.1.1 google.com` |
+| 5. OS resolver | `Resolve-DnsName` | `dscacheutil -q host -a name google.com` | `getent hosts google.com` / `resolvectl query` |
+| 6. App layer | `Invoke-WebRequest` | `curl -v` | `curl -v` |
+
+## Rung 1 — Link Layer
+
+**Question:** Is there a physical / wireless connection with a valid IP and gateway?
+
+**Pass criteria:** At least one adapter `Up` / `active` / `UP`, has an IPv4 address, has a default gateway.
+
+**Fail → check:** Driver state, cable, wifi association, DHCP lease, static config typo.
+
+**Common gotchas across all OSes:**
+- A `169.254.x.x` address (Windows/Linux) or `self-assigned` (macOS) means DHCP failed silently
+- Multiple `Up` adapters can have competing default routes; check route metric / priority
+
+## Rung 2 — IP / ICMP Reachability
+
+**Question:** Can packets leave the box and reach the public internet?
+
+**Pass criteria:** Replies in single-digit to low-double-digit milliseconds for at least one public anycast IP.
+
+**Fail → check:** Routing table, firewall rules blocking ICMP outbound, ISP outage, captive portal.
+
+**Watch for:** Some ISPs and corporate firewalls block ICMP entirely while allowing TCP/UDP. If ICMP fails but TCP socket tests pass on rung 3, ICMP is the *only* thing blocked — rare but real, especially on enterprise networks.
+
+## Rung 3 — TCP/UDP Socket Reachability
+
+**Question:** Can specific transport-layer connections complete?
+
+**Critical discriminator:** Test multiple destinations on the same port. If `1.1.1.1:443` fails but `140.82.114.4:443` (github.com) succeeds, the block is **destination-specific**, not a general firewall. Strongly suggests AV with "Encrypted DNS Detection" or per-IP blocklist.
+
+**Raw UDP/53 test is essential.** Most OS-level DNS probes (`Resolve-DnsName`, `dscacheutil`, `getent`) go through the system resolver and inherit every hook in the path. To test UDP/53 itself, use:
+- Windows: `dig` (if installed) or a custom `UdpClient` (see `probe.ps1`)
+- macOS / Linux: `dig +tries=1 @<server> <host>`
+
+`dig` explicitly bypasses the OS resolver chain. This is what makes it the killer discriminator on Unix systems — same role as `nslookup` on Windows.
+
+## Rung 4 — DNS Infrastructure
+
+**Question:** Does a DNS server actually answer queries?
+
+**Pass criteria:** All three resolvers (default + two public) return a name and address. The IPs may differ (different anycast points) — that's fine.
+
+**Fail → check:** UDP/53 outbound blocked (back to rung 3 raw test), router's DNS forwarder broken, ISP DNS hijack misconfigured.
+
+**Subtle bugs:**
+- If the resolver returns only IPv6 (AAAA) records for a site that should have IPv4, the resolver may be misconfigured for record-type ordering — apps preferring A records will hang
+- If different resolvers return wildly different IPs (different from anycast variation), you may be facing DNS poisoning or split-horizon weirdness
+
+## Rung 5 — OS Resolver Path (THE INTERESTING LAYER)
+
+**Question:** Does the operating system's name-resolution chain actually return correct addresses?
+
+**THE SMOKING GUN:** Rung 4 passes (bypass tool works) but rung 5 fails (OS resolver times out). The DNS infrastructure is healthy but **something is hooking the system resolver path.**
+
+### Windows suspects
+
+| Hook | Detection |
+|---|---|
+| **NRPT (Name Resolution Policy Table)** | `Get-DnsClientNrptRule \| Where Namespace -eq '.'` |
+| **HOSTS file** | `Get-Content $env:windir\System32\drivers\etc\hosts` |
+| **WFP callout driver** | `Get-CimInstance Win32_SystemDriver \| Where Name -match 'wfp\|epfw'` |
+| **DNS Client service hooked** | Third-party LSP catalog entries, dependent services |
+| **Local 127.0.0.1:53 proxy** | `Get-NetUDPEndpoint -LocalPort 53` |
+
+### macOS suspects
+
+| Hook | Detection |
+|---|---|
+| **`/etc/resolver/<domain>` files** | `ls /etc/resolver/` — per-domain overrides, classic VPN residue |
+| **scutil DNS state** | `scutil --dns` — shows "resolver #N" entries; extras = potential hook |
+| **Configuration profiles (MDM)** | `profiles list -type configuration` — can install DNS overrides |
+| **mDNSResponder state** | `pgrep -x mDNSResponder` — if dead, all DNS dies |
+| **Third-party kext** | `kextstat \| grep -iE 'cisco\|anyconnect\|proton\|mullvad'` |
+| **PAC file / proxy** | `scutil --proxy` |
+
+### Linux suspects
+
+| Hook | Detection |
+|---|---|
+| **`/etc/nsswitch.conf` hosts line** | NSS order excludes `resolve` or `dns` → bypass entirely |
+| **systemd-resolved state** | `resolvectl status` — per-link DNS / search domains |
+| **`/etc/resolv.conf` symlink** | `readlink /etc/resolv.conf` — should point at the stub on systemd systems |
+| **NetworkManager DNS mode** | `/etc/NetworkManager/NetworkManager.conf` `[main] dns=` |
+| **dnsmasq instance** | `pgrep -x dnsmasq` + `/etc/dnsmasq.d/` |
+| **Local 127.x:53 listener** | `ss -tulnp \| grep :53` |
+
+## Rung 6 — Application Layer
+
+**Question:** Can a real application make a real HTTP request to a real hostname?
+
+**Fail BUT rung 5 passed → check:**
+
+| OS | Most common causes |
+|---|---|
+| Windows | WinHTTP proxy (`netsh winhttp show proxy`), cert store, TLS, IPv6 preference, app-specific config |
+| macOS | System proxy (`scutil --proxy`), keychain cert issues, IPv6 preference, app-specific config |
+| Linux | `http_proxy` / `https_proxy` env vars, CA bundle path, IPv6 preference, app-specific config |
+
+## Discriminator Cheat Sheet
+
+| Symptoms | Diagnosis |
+|---|---|
+| Rung 1 fails | Hardware / driver / wifi association |
+| Rungs 1 pass, 2 fails | Routing or ISP |
+| Rungs 1-2 pass, 3 fails for all dests | Outbound firewall blocking the port |
+| Rungs 1-2 pass, 3 fails for specific dests | Destination-specific filter (AV "Encrypted DNS Detection") |
+| Rungs 1-3 pass, 4 fails | DNS server / forwarder broken upstream |
+| Rungs 1-4 pass, 5 fails | **OS resolver hook — go to per-OS dns-audit script** |
+| Rungs 1-5 pass, 6 fails | Proxy, cert store, TLS, IPv6 preference, app-specific |
+
+## When the Ladder Doesn't Help
+
+Some failures are stateful or intermittent and won't show on a single probe pass:
+
+- **Time-based:** DNS works for 30s then breaks. Loop the probe; watch for transition timestamps.
+- **Per-network:** Fails on wifi, works on ethernet. Compare per-interface resolver config on each OS.
+- **Per-application:** Browsers fail, system tools work. Look at app-specific resolvers — Chrome / Firefox have their own DoH paths, curl has its own resolver, etc.
+
+For these, augment the ladder with continuous probing and per-interface comparison.

+ 53 - 0
skills/net-ops/scripts/_lib/cache.sh

@@ -0,0 +1,53 @@
+# net-ops :: _lib/cache.sh
+# Lightweight state cache for --quick mode.
+# Stores the last probe's summary so we can decide whether to run the full
+# ladder or just the most failure-prone layers (rungs 5+6).
+
+CACHE_DIR="${NETOPS_CACHE_DIR:-${TMPDIR:-/tmp}/net-ops}"
+CACHE_FILE="$CACHE_DIR/last-state.json"
+CACHE_MAX_AGE_SECONDS="${NETOPS_CACHE_MAX_AGE:-600}"  # 10 minutes default
+
+QUICK_MODE=0
+
+parse_quick_flag() {
+    for a in "$@"; do
+        [[ "$a" == "--quick" ]] && QUICK_MODE=1
+    done
+}
+
+# Returns 0 if the last cached state is "all healthy and recent" — caller
+# should then skip rungs 1-4 and only run 5+6. Returns 1 otherwise.
+cache_indicates_healthy() {
+    [[ "$QUICK_MODE" -eq 1 ]] || return 1
+    [[ -f "$CACHE_FILE" ]] || return 1
+    # File age check (BSD vs GNU stat compat)
+    local mtime now age
+    mtime=$(stat -f %m "$CACHE_FILE" 2>/dev/null || stat -c %Y "$CACHE_FILE" 2>/dev/null)
+    [[ -n "$mtime" ]] || return 1
+    now=$(date +%s)
+    age=$(( now - mtime ))
+    (( age > CACHE_MAX_AGE_SECONDS )) && return 1
+    # Parse: only return healthy if fail count was zero
+    grep -q '"fail":0' "$CACHE_FILE" 2>/dev/null
+}
+
+# Save current state at end of probe.
+cache_save_state() {
+    local pass="$1" fail="$2" first_fail="$3"
+    mkdir -p "$CACHE_DIR" 2>/dev/null || return 0
+    printf '{"ts":%d,"pass":%d,"fail":%d,"first_fail":%q}\n' \
+        "$(date +%s)" "$pass" "$fail" "$first_fail" > "$CACHE_FILE" 2>/dev/null || true
+}
+
+# Predicate the probe can call to decide whether to run a given rung.
+# Args: rung_number (1..7). In quick mode, only 5 and 6 run.
+should_run_rung() {
+    local rung="$1"
+    if cache_indicates_healthy; then
+        case "$rung" in
+            5|6) return 0 ;;
+            *)   return 1 ;;
+        esac
+    fi
+    return 0
+}

+ 92 - 0
skills/net-ops/scripts/_lib/output.sh

@@ -0,0 +1,92 @@
+# net-ops :: _lib/output.sh
+# Output mode handling for probe scripts. Supports two modes:
+#   - text (default): human-readable [PASS]/[FAIL] lines + summary block
+#   - json: newline-delimited JSON, one record per check + a summary record
+#
+# Usage in a probe script:
+#   source "$(dirname "$0")/../_lib/output.sh"
+#   parse_output_flags "$@"
+#   # then use section / pass / fail as before — they auto-route to the right mode
+
+JSON_MODE="${JSON_MODE:-0}"
+
+parse_output_flags() {
+    for a in "$@"; do
+        [[ "$a" == "--json" ]] && JSON_MODE=1
+    done
+}
+
+# JSON-safe string escaper. Handles backslash, double-quote, and control chars.
+_json_escape() {
+    local s="$1"
+    s="${s//\\/\\\\}"
+    s="${s//\"/\\\"}"
+    s="${s//$'\n'/\\n}"
+    s="${s//$'\r'/\\r}"
+    s="${s//$'\t'/\\t}"
+    printf '%s' "$s"
+}
+
+# These three are the public API. They write either text or JSON depending on mode.
+PASS_COUNT=0
+FAIL_COUNT=0
+FIRST_FAIL=""
+CURRENT_SECTION=""
+
+section() {
+    CURRENT_SECTION="$1"
+    if [[ "$JSON_MODE" -eq 1 ]]; then
+        printf '{"type":"section","name":"%s"}\n' "$(_json_escape "$1")"
+    else
+        echo
+        echo "=== $1 ==="
+    fi
+}
+
+pass() {
+    PASS_COUNT=$((PASS_COUNT + 1))
+    if [[ "$JSON_MODE" -eq 1 ]]; then
+        printf '{"type":"check","section":"%s","label":"%s","status":"pass","detail":"%s"}\n' \
+            "$(_json_escape "$CURRENT_SECTION")" "$(_json_escape "$1")" "$(_json_escape "${2:-}")"
+    else
+        echo "[PASS] $1${2:+ :: $2}"
+    fi
+}
+
+fail() {
+    FAIL_COUNT=$((FAIL_COUNT + 1))
+    [[ -z "$FIRST_FAIL" ]] && FIRST_FAIL="[$CURRENT_SECTION] $1"
+    if [[ "$JSON_MODE" -eq 1 ]]; then
+        printf '{"type":"check","section":"%s","label":"%s","status":"fail","detail":"%s"}\n' \
+            "$(_json_escape "$CURRENT_SECTION")" "$(_json_escape "$1")" "$(_json_escape "${2:-}")"
+    else
+        echo "[FAIL] $1${2:+ :: $2}"
+    fi
+}
+
+# Call from end of probe to emit summary record / block.
+emit_summary() {
+    if [[ "$JSON_MODE" -eq 1 ]]; then
+        printf '{"type":"summary","pass":%d,"fail":%d,"first_fail":"%s"}\n' \
+            "$PASS_COUNT" "$FAIL_COUNT" "$(_json_escape "$FIRST_FAIL")"
+    else
+        echo
+        echo "=== SUMMARY ==="
+        echo "  PASS: $PASS_COUNT    FAIL: $FAIL_COUNT"
+        if [[ -n "$FIRST_FAIL" ]]; then
+            echo "  First failure: $FIRST_FAIL"
+        else
+            echo "  No failures."
+        fi
+    fi
+}
+
+# Helper for scripts that want to suppress informational/diagnostic output
+# (the non-PASS/FAIL annotations like scutil dumps) in JSON mode.
+info() {
+    if [[ "$JSON_MODE" -eq 1 ]]; then
+        # Optional: emit info records. Keep silent for cleaner JSON parsing.
+        return 0
+    fi
+    echo "$@"
+}

+ 64 - 0
skills/net-ops/scripts/_lib/redact.sh

@@ -0,0 +1,64 @@
+# net-ops :: _lib/redact.sh
+# Shared opsec redaction for diagnostic output. Source from any bash script:
+#
+#   source "$(dirname "$0")/../_lib/redact.sh"
+#   parse_redact_flag "$@"
+#   maybe_redact_self "$@"    # re-invokes self without --redact if flag set
+#
+# Public IPs (1.1.1.1, 8.8.8.8, Tailscale 100.100.100.100 anchor) are
+# preserved — they're diagnostic landmarks. Private/CGNAT/link-local
+# ranges, MACs, and *.ts.net tailnet names are masked.
+
+REDACT="${REDACT:-0}"
+
+parse_redact_flag() {
+    for a in "$@"; do
+        [[ "$a" == "--redact" ]] && REDACT=1
+    done
+}
+
+redact_filter() {
+    if [[ "${REDACT:-0}" -eq 0 ]]; then cat; return; fi
+    perl -pe '
+        # Preserve well-known anchors first
+        s/100\.100\.100\.100/__TS_MAGIC__/g;
+        # Redact private / CGNAT / link-local IPv4
+        s/\b10\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/10.X.X.X/g;
+        s/\b172\.(1[6-9]|2[0-9]|3[01])\.\d{1,3}\.\d{1,3}\b/172.X.X.X/g;
+        s/\b192\.168\.\d{1,3}\.\d{1,3}\b/192.168.X.X/g;
+        s/\b100\.(6[4-9]|[7-9]\d|1[01]\d|12[0-7])\.\d{1,3}\.\d{1,3}\b/100.X.X.X/g;
+        s/\b169\.254\.\d{1,3}\.\d{1,3}\b/169.254.X.X/g;
+        # MAC addresses (both : and - separators)
+        s/\b[0-9a-fA-F]{2}([:-])[0-9a-fA-F]{2}\1[0-9a-fA-F]{2}\1[0-9a-fA-F]{2}\1[0-9a-fA-F]{2}\1[0-9a-fA-F]{2}\b/XX:XX:XX:XX:XX:XX/g;
+        # Tailscale tailnet names
+        s/\b[a-z0-9-]+\.ts\.net\b/REDACTED.ts.net/g;
+        # Restore anchors
+        s/__TS_MAGIC__/100.100.100.100/g;
+    '
+}
+
+# Helper: self-reinvoke and pipe through post-processing filters when needed.
+# Handles both --redact (mask private addrs) and --json (drop non-JSON chatter).
+# Avoids bash 3.2 exec-redirect quirks via single-level subprocess.
+maybe_redact_self() {
+    # Only reinvoke if at least one filter is active
+    [[ "${REDACT:-0}" -eq 1 ]] || [[ "${JSON_MODE:-0}" -eq 1 ]] || return 0
+    # Prevent infinite recursion
+    [[ "${_NETOPS_POSTPROCESSED:-0}" -eq 1 ]] && return 0
+    export _NETOPS_POSTPROCESSED=1
+
+    # Strip --redact from args (child runs without it to avoid double-recursion).
+    # --json is preserved so JSON_MODE stays set in the child for any code that
+    # changes behavior in JSON mode (e.g. info() suppression in output.sh).
+    local cleaned_args=()
+    for a in "$@"; do [[ "$a" != "--redact" ]] && cleaned_args+=("$a"); done
+
+    if [[ "${JSON_MODE:-0}" -eq 1 ]] && [[ "${REDACT:-0}" -eq 1 ]]; then
+        "$0" ${cleaned_args[@]+"${cleaned_args[@]}"} | grep '^{' | redact_filter
+    elif [[ "${JSON_MODE:-0}" -eq 1 ]]; then
+        "$0" ${cleaned_args[@]+"${cleaned_args[@]}"} | grep '^{'
+    else
+        "$0" ${cleaned_args[@]+"${cleaned_args[@]}"} | redact_filter
+    fi
+    exit "${PIPESTATUS[0]}"
+}

+ 98 - 0
skills/net-ops/scripts/linux/dns-audit.sh

@@ -0,0 +1,98 @@
+#!/usr/bin/env bash
+# net-ops :: linux/dns-audit.sh
+# Deep DNS forensics for Linux. Use when probe.sh shows rung 4 (dig) PASS
+# but rung 5 (getent / resolvectl) FAIL.
+
+set -u
+
+# shellcheck source=../_lib/redact.sh
+source "$(dirname "$0")/../_lib/redact.sh"
+parse_redact_flag "$@"
+maybe_redact_self "$@"
+
+echo "=== /etc/nsswitch.conf (hosts line) ==="
+grep "^hosts:" /etc/nsswitch.conf 2>/dev/null || echo "  (no hosts entry)"
+
+echo
+echo "=== /etc/resolv.conf ==="
+if [[ -L /etc/resolv.conf ]]; then
+    echo "  Type: symlink -> $(readlink /etc/resolv.conf)"
+else
+    echo "  Type: regular file"
+fi
+echo "  Modified: $(stat -c '%y' /etc/resolv.conf 2>/dev/null || stat -f '%Sm' /etc/resolv.conf 2>/dev/null)"
+echo "  --- contents ---"
+cat /etc/resolv.conf 2>/dev/null | sed 's/^/  /'
+
+echo
+echo "=== systemd-resolved ==="
+if systemctl is-active systemd-resolved >/dev/null 2>&1; then
+    echo "  Service: active"
+    echo "  --- resolvectl status ---"
+    resolvectl status 2>/dev/null | sed 's/^/  /'
+else
+    echo "  Service: inactive or not installed"
+fi
+
+echo
+echo "=== NetworkManager DNS config ==="
+if command -v nmcli >/dev/null 2>&1; then
+    echo "  --- nmcli dev show (DNS lines) ---"
+    nmcli dev show 2>/dev/null | grep -E 'DEVICE|IP4.DNS|IP6.DNS|DOMAIN' | sed 's/^/  /'
+    echo
+    echo "  --- NetworkManager dns mode ---"
+    awk '/\[main\]/,/\[/{if(/^dns/) print}' /etc/NetworkManager/NetworkManager.conf 2>/dev/null | sed 's/^/  /' || true
+    ls -la /etc/NetworkManager/conf.d/ 2>/dev/null | sed 's/^/  /' || true
+else
+    echo "  nmcli not installed"
+fi
+
+echo
+echo "=== dnsmasq ==="
+if pgrep -x dnsmasq >/dev/null; then
+    pid=$(pgrep -x dnsmasq | head -1)
+    echo "  Running, PID $pid"
+    ps -o command -p "$pid" 2>/dev/null | sed 's/^/  /'
+else
+    echo "  not running"
+fi
+for d in /etc/dnsmasq.d /etc/NetworkManager/dnsmasq.d; do
+    [[ -d "$d" ]] && { echo "  $d contents:"; ls "$d" 2>/dev/null | sed 's/^/    /'; }
+done
+
+echo
+echo "=== Local DNS listeners ==="
+ss -tulnp 2>/dev/null | awk 'NR==1 || $5 ~ /:53$/' | sed 's/^/  /'
+
+echo
+echo "=== /etc/hosts (non-comment) ==="
+grep -vE '^\s*(#|$)' /etc/hosts 2>/dev/null | sed 's/^/  /' || echo "  (no custom entries)"
+
+echo
+echo "=== VPN / WireGuard interfaces ==="
+ip -br link 2>/dev/null | awk '/^(wg|tun|tap|nordlynx|proton|mullvad|nextdns)/' | sed 's/^/  /' || true
+if command -v wg >/dev/null 2>&1; then
+    echo "  --- wg show ---"
+    wg show 2>/dev/null | sed 's/^/  /' | head -30 || true
+fi
+
+echo
+echo "=== ATTRIBUTION HINTS ==="
+# Inspect nameservers visible across the stack for known patterns
+ns_list=$( {
+    awk '/^nameserver/{print $2}' /etc/resolv.conf 2>/dev/null
+    resolvectl status 2>/dev/null | awk '/Current DNS Server:|DNS Servers:/{for(i=4;i<=NF;i++)print $i}'
+    nmcli -t -f IP4.DNS,IP6.DNS dev show 2>/dev/null | awk -F: '{print $2}'
+} | sort -u | grep -v '^$' )
+
+while read -r n; do
+    [[ -z "$n" ]] && continue
+    case "$n" in
+        10.2.0.*)              echo "  $n :: likely Proton VPN gateway" ;;
+        10.64.0.*)             echo "  $n :: likely Mullvad gateway" ;;
+        10.211.*|10.212.*)     echo "  $n :: likely Cisco AnyConnect" ;;
+        100.100.100.100)       echo "  $n :: Tailscale MagicDNS (expected)" ;;
+        127.0.0.53)            echo "  $n :: systemd-resolved stub (expected on most systems)" ;;
+        127.0.0.1|127.0.0.2)   echo "  $n :: local DNS proxy (dnsmasq, NextDNS, AdGuard, etc.)" ;;
+    esac
+done <<< "$ns_list"

+ 326 - 0
skills/net-ops/scripts/linux/probe.sh

@@ -0,0 +1,326 @@
+#!/usr/bin/env bash
+# net-ops :: linux/probe.sh
+# Full layered diagnostic ladder for Linux network troubleshooting.
+# Outputs structured [PASS]/[FAIL] lines so a human or LLM can scan for
+# the first FAIL and drill in.
+
+set -u
+
+TEST_HOST="${TEST_HOST:-google.com}"
+TEST_IPS=("1.1.1.1" "8.8.8.8")
+TIMEOUT="${TIMEOUT:-5}"
+
+# shellcheck source=../_lib/redact.sh
+source "$(dirname "$0")/../_lib/redact.sh"
+# shellcheck source=../_lib/output.sh
+source "$(dirname "$0")/../_lib/output.sh"
+parse_redact_flag "$@"
+parse_output_flags "$@"
+maybe_redact_self "$@"
+
+# ---------------------------------------------------------------------------
+section "1. LINK LAYER"
+# ---------------------------------------------------------------------------
+ip -br link 2>/dev/null | awk '$2=="UP"{print $1}' | while read -r dev; do
+    [[ "$dev" == "lo" ]] && continue
+    addr=$(ip -br -4 addr show "$dev" 2>/dev/null | awk '{print $3}')
+    pass "Interface $dev UP" "${addr:-no IPv4}"
+done
+
+GATEWAY=$(ip route show default 2>/dev/null | awk '/default/{print $3; exit}')
+DEFAULT_IF=$(ip route show default 2>/dev/null | awk '/default/{print $5; exit}')
+[[ -n "$GATEWAY" ]] && pass "Default gateway" "$GATEWAY via $DEFAULT_IF" || fail "Default gateway" "none configured"
+
+# ---------------------------------------------------------------------------
+section "2. IP / ICMP REACHABILITY"
+# ---------------------------------------------------------------------------
+[[ -n "${GATEWAY:-}" ]] && {
+    if ping -c 2 -W "$TIMEOUT" "$GATEWAY" >/dev/null 2>&1; then pass "Ping gateway $GATEWAY"; else fail "Ping gateway $GATEWAY"; fi
+}
+for ip in "${TEST_IPS[@]}"; do
+    if ping -c 2 -W "$TIMEOUT" "$ip" >/dev/null 2>&1; then pass "Ping $ip"; else fail "Ping $ip"; fi
+done
+
+# ---------------------------------------------------------------------------
+section "3. TCP/UDP SOCKET REACHABILITY"
+# ---------------------------------------------------------------------------
+for ip in "${TEST_IPS[@]}"; do
+    if timeout "$TIMEOUT" bash -c "</dev/tcp/$ip/443" 2>/dev/null; then pass "TCP/443 -> $ip"; else fail "TCP/443 -> $ip"; fi
+    if timeout "$TIMEOUT" bash -c "</dev/tcp/$ip/53"  2>/dev/null; then pass "TCP/53  -> $ip"; else fail "TCP/53  -> $ip"; fi
+done
+
+# Raw UDP/53 via dig with explicit server — bypasses /etc/resolv.conf
+for ip in "${TEST_IPS[@]}"; do
+    if result=$(dig +short +time="$TIMEOUT" +tries=1 @"$ip" "$TEST_HOST" 2>&1) && [[ -n "$result" ]] && [[ ! "$result" =~ "timed out"|"connection refused" ]]; then
+        pass "UDP/53 -> $ip (dig)" "$(echo "$result" | head -1)"
+    else
+        fail "UDP/53 -> $ip (dig)" "$result"
+    fi
+done
+
+# ---------------------------------------------------------------------------
+section "4. DNS INFRASTRUCTURE (bypass tools)"
+# ---------------------------------------------------------------------------
+# dig uses its own resolver — does NOT touch glibc NSS chain
+for srv in "" "${TEST_IPS[@]}"; do
+    if [[ -z "$srv" ]]; then
+        out=$(dig +short +time="$TIMEOUT" +tries=1 "$TEST_HOST" 2>&1)
+        label="default"
+    else
+        out=$(dig +short +time="$TIMEOUT" +tries=1 @"$srv" "$TEST_HOST" 2>&1)
+        label="$srv"
+    fi
+    if [[ -n "$out" && ! "$out" =~ "timed out"|"connection refused" ]]; then
+        pass "dig via $label" "$(echo "$out" | head -1)"
+    else
+        fail "dig via $label" "$out"
+    fi
+done
+
+# ---------------------------------------------------------------------------
+section "5. LINUX RESOLVER PATH (the hook layer)"
+# ---------------------------------------------------------------------------
+# getent uses glibc NSS — goes through the whole system resolver chain
+if out=$(getent hosts "$TEST_HOST" 2>&1) && [[ -n "$out" ]]; then
+    addr=$(echo "$out" | awk '{print $1; exit}')
+    pass "getent hosts (NSS path)" "$addr"
+else
+    fail "getent hosts (NSS path)" "$out"
+fi
+
+# resolvectl query if systemd-resolved present
+if command -v resolvectl >/dev/null 2>&1; then
+    if out=$(resolvectl query "$TEST_HOST" 2>&1) && echo "$out" | grep -q "^$TEST_HOST:"; then
+        addr=$(echo "$out" | awk '/^[^:]+:.+[0-9]+\./{print $2; exit}')
+        pass "resolvectl query" "$addr"
+    else
+        fail "resolvectl query" "$(echo "$out" | head -2)"
+    fi
+fi
+
+# nsswitch.conf — name resolution order
+echo "  /etc/nsswitch.conf hosts line:"
+grep "^hosts:" /etc/nsswitch.conf 2>/dev/null | sed 's/^/    /'
+
+# /etc/resolv.conf — is it the systemd-resolved stub, NetworkManager's, or static?
+echo "  /etc/resolv.conf:"
+if [[ -L /etc/resolv.conf ]]; then
+    target=$(readlink /etc/resolv.conf)
+    echo "    symlink -> $target"
+fi
+head -5 /etc/resolv.conf 2>/dev/null | sed 's/^/    /'
+
+# Active resolver listeners on 127.x:53
+echo "  Local DNS listeners on 127.0.0.x:53:"
+ss -tulnp 2>/dev/null | awk '$5 ~ /^127\./ && $5 ~ /:53$/' | sed 's/^/    /' || true
+
+# systemd-resolved status (if present)
+if systemctl is-active systemd-resolved >/dev/null 2>&1; then
+    echo "  systemd-resolved active. Per-link DNS:"
+    resolvectl status 2>/dev/null | awk '
+        /^Link [0-9]+/{link=$0; show=0; printed=0}
+        /Current DNS Server:|DNS Servers:|DNS Domain:/{
+            if(!printed){print "    "link; printed=1}
+            print "      "$0
+        }
+    ' | head -40
+fi
+
+# ---------------------------------------------------------------------------
+# Time-sync deep-dive: HTTP Date drift + check timedatectl/chrony/ntpd status
+remote_date=$(curl -sIA 'net-ops-probe' --max-time 5 https://www.google.com 2>/dev/null | awk -F': ' 'tolower($1)=="date"{print $2; exit}' | tr -d '\r')
+drift_ok=1
+drift_detail=""
+if [[ -n "$remote_date" ]]; then
+    remote_epoch=$(date -d "$remote_date" +%s 2>/dev/null)
+    if [[ -n "$remote_epoch" ]]; then
+        local_epoch=$(date +%s)
+        drift=$(( local_epoch - remote_epoch ))
+        abs_drift=${drift#-}
+        if [[ "$abs_drift" -lt 300 ]]; then
+            drift_detail="${drift}s vs HTTP Date (within ±5min)"
+        else
+            drift_ok=0
+            drift_detail="${drift}s drift — will break TLS cert validation"
+        fi
+    fi
+fi
+
+# Detect which time daemon and its sync state
+sync_detail=""
+if command -v timedatectl >/dev/null 2>&1; then
+    sync_state=$(timedatectl show 2>/dev/null | awk -F= '/^NTPSynchronized=/{print $2}')
+    sync_detail="systemd-timesyncd NTPSynchronized=$sync_state"
+elif command -v chronyc >/dev/null 2>&1; then
+    stratum=$(chronyc tracking 2>/dev/null | awk -F': ' '/Stratum/{print $2}')
+    sync_detail="chronyd stratum=$stratum"
+    [[ "$stratum" == "16" ]] && drift_ok=0
+elif command -v ntpq >/dev/null 2>&1; then
+    sync_detail="ntpd present (run 'ntpq -p' for peer status)"
+fi
+
+combined="$drift_detail${sync_detail:+; $sync_detail}"
+if [[ "$drift_ok" -eq 1 ]]; then
+    pass "Time sync" "$combined"
+else
+    fail "Time sync" "$combined"
+fi
+
+# MTU / path-MTU discovery. Linux uses -M do (don't fragment).
+if ping -M do -s 1472 -c 1 -W 3 1.1.1.1 >/dev/null 2>&1; then
+    pass "Path MTU 1500 (1472-byte DF payload)" "to 1.1.1.1"
+else
+    if ping -M do -s 1400 -c 1 -W 3 1.1.1.1 >/dev/null 2>&1; then
+        fail "Path MTU 1500 (1472-byte DF payload)" "1500 fails, 1428+ works — path MTU < 1500 (VPN/PPPoE?)"
+    else
+        pass "Path MTU test inconclusive" "ICMP DF blocked or destination unreachable"
+    fi
+fi
+
+# IPv6 deep-dive — classifies v6 stack state across four meaningful tiers.
+v6_state=""
+v6_detail=""
+
+v6_addrs=$(ip -6 -br addr show scope global 2>/dev/null | awk '{for(i=3;i<=NF;i++) print $1" "$i}' | grep -v '^lo ')
+v6_global=$(printf '%s\n' "$v6_addrs" | awk '$2 !~ /^fd/ && $2 !~ /^fc/{print; exit}')
+v6_default=$(ip -6 route show default 2>/dev/null | head -1)
+
+if [[ -z "$v6_addrs" ]]; then
+    v6_state="disabled"
+    v6_detail="no global v6 addresses — IPv6 disabled or unconfigured (check sysctl net.ipv6.conf.all.disable_ipv6)"
+elif [[ -z "$v6_global" ]]; then
+    v6_state="ula_only"
+    v6_detail="only ULA (fc00::/7) addresses present — router not delegating public v6 prefix"
+elif [[ -z "$v6_default" ]]; then
+    v6_state="no_route"
+    v6_detail="global v6 address present but no default route — RA not received (check accept_ra sysctl)"
+else
+    aaaa=$(dig +short +time=2 +tries=1 AAAA "$TEST_HOST" 2>/dev/null | head -1)
+    if [[ -n "$aaaa" ]] && curl -6 -sS -o /dev/null --max-time 4 "https://$TEST_HOST" 2>/dev/null; then
+        v6_state="healthy"
+        v6_detail="global addr + default route + curl -6 works"
+    else
+        v6_state="path_broken"
+        v6_detail="addr present, default route present, but curl -6 fails — firewall or ISP black-holing"
+    fi
+fi
+
+case "$v6_state" in
+    disabled|healthy) pass "IPv6 stack ($v6_state)" "$v6_detail" ;;
+    *) fail "IPv6 stack ($v6_state)" "$v6_detail" ;;
+esac
+
+# ---------------------------------------------------------------------------
+section "6. APPLICATION LAYER (real HTTP request)"
+# ---------------------------------------------------------------------------
+for url in "https://www.google.com" "https://github.com"; do
+    if out=$(curl -sS -o /dev/null -w "%{http_code} %{size_download}b" --max-time "$TIMEOUT" "$url" 2>&1); then
+        pass "GET $url" "$out"
+    else
+        fail "GET $url" "$out"
+    fi
+done
+
+# ---------------------------------------------------------------------------
+section "7. KNOWN VPN / DNS CLIENT FOOTPRINT"
+# ---------------------------------------------------------------------------
+# Browser DoH state — Chrome / Brave / Edge / Firefox bypass system DNS when DoH set.
+browser_findings=""
+for label_prefs in \
+    "Chrome:$HOME/.config/google-chrome/Default/Preferences" \
+    "Chromium:$HOME/.config/chromium/Default/Preferences" \
+    "Brave:$HOME/.config/BraveSoftware/Brave-Browser/Default/Preferences" \
+    "Edge:$HOME/.config/microsoft-edge/Default/Preferences"; do
+    label="${label_prefs%%:*}"
+    prefs="${label_prefs#*:}"
+    [[ -f "$prefs" ]] || continue
+    mode=$(perl -ne 'if (/"dns_over_https"\s*:\s*\{[^}]*"mode"\s*:\s*"([^"]+)"/) { print "$1\n"; exit }' "$prefs" 2>/dev/null)
+    templates=$(perl -ne 'if (/"dns_over_https"\s*:\s*\{[^}]*"templates"\s*:\s*"([^"]+)"/) { print "$1\n"; exit }' "$prefs" 2>/dev/null)
+    if [[ -n "$mode" ]]; then
+        browser_findings+="    $label DoH: mode=$mode${templates:+, server=$templates}\n"
+    else
+        browser_findings+="    $label installed, DoH: not configured (system DNS)\n"
+    fi
+done
+for fx_prefs in "$HOME/.mozilla/firefox"/*.default*/prefs.js; do
+    [[ -f "$fx_prefs" ]] || continue
+    trr_mode=$(awk -F'"' '/"network.trr.mode"/{print $4; exit}' "$fx_prefs" 2>/dev/null)
+    trr_uri=$(awk -F'"' '/"network.trr.uri"/{print $4; exit}' "$fx_prefs" 2>/dev/null)
+    case "${trr_mode:-0}" in
+        2) state="enabled (with system fallback)" ;;
+        3) state="enabled (no fallback)" ;;
+        5) state="disabled by policy" ;;
+        *) state="off (system DNS)" ;;
+    esac
+    browser_findings+="    Firefox DoH: $state${trr_uri:+, server=$trr_uri}\n"
+    break
+done
+if [[ -n "$browser_findings" ]]; then
+    info "  Browser DoH state (browsers may bypass system DNS):"
+    printf '%b' "$browser_findings"
+fi
+
+KNOWN=(
+    /etc/openvpn /etc/wireguard /opt/cisco /etc/proton-vpn /etc/mullvad-vpn
+    /opt/nordvpn /etc/NetworkManager/dnsmasq.d /etc/dnsmasq.d
+    /etc/cloudflared /etc/nextdns.conf
+)
+for p in "${KNOWN[@]}"; do
+    [[ -e "$p" ]] && echo "  Found: $p"
+done
+
+# Running VPN / DNS proxy processes
+echo "  VPN / DNS proxy processes:"
+pgrep -af 'openvpn|wireguard|wg-quick|mullvad|proton|nordvpn|cloudflared|nextdns|dnsmasq|stubby|dnscrypt' 2>/dev/null | head -10 | sed 's/^/    /' || true
+
+# ---------------------------------------------------------------------------
+section "8. ENVIRONMENT (WSL / container detection)"
+# ---------------------------------------------------------------------------
+env_type=""
+if [[ -f /proc/sys/fs/binfmt_misc/WSLInterop ]] || grep -qi microsoft /proc/version 2>/dev/null; then
+    env_type="WSL2"
+elif [[ -f /.dockerenv ]]; then
+    env_type="Docker container"
+elif grep -qE 'docker|containerd|kubepods' /proc/1/cgroup 2>/dev/null; then
+    env_type="container (cgroup signature)"
+fi
+
+if [[ -z "$env_type" ]]; then
+    info "  Bare-metal / VM Linux (no WSL/container signature)"
+else
+    info "  Detected environment: $env_type"
+    case "$env_type" in
+        WSL2*)
+            info "  WSL2 has bespoke DNS handling. Key files if DNS misbehaves:"
+            info "    /etc/wsl.conf       — controls generateResolvConf"
+            info "    /etc/resolv.conf    — auto-generated by WSL unless wsl.conf opts out"
+            info "    Host Windows DNS    — affects WSL DNS via mirrored mode"
+            info "  Fix pattern: edit /etc/wsl.conf, set [network] generateResolvConf=false, write static /etc/resolv.conf"
+            [[ -f /etc/wsl.conf ]] && { info "    --- /etc/wsl.conf ---"; sed 's/^/      /' /etc/wsl.conf; }
+            info "    --- /etc/resolv.conf head ---"
+            head -5 /etc/resolv.conf 2>/dev/null | sed 's/^/      /'
+            ;;
+        Docker*|container*)
+            info "  Container DNS inherits from host or --dns flag at run time."
+            info "    /etc/resolv.conf here is set by runtime, not user."
+            info "  If broken inside container but fine on host: check 'docker network inspect' / runtime config."
+            ;;
+    esac
+fi
+
+emit_summary
+if [[ "$JSON_MODE" -eq 0 ]]; then
+    if [[ -n "$FIRST_FAIL" ]]; then
+        case "$FIRST_FAIL" in
+            *"LINK LAYER"*)    echo "  Next: check ip link / ip addr, DHCP, NetworkManager state" ;;
+            *"SOCKET"*)        echo "  Next: check iptables/nftables OUTPUT chain; AV protocol filtering; consumer router DoH IP blocking" ;;
+            *"ICMP"*|*"IP /"*) echo "  Next: check ip route, ISP/upstream connectivity" ;;
+            *"DNS INFRASTRUCTURE"*) echo "  Next: check UDP/53 outbound, /etc/resolv.conf upstream" ;;
+            *"RESOLVER PATH"*) echo "  Next: bash scripts/linux/dns-audit.sh   # drill rung 5 (the hook layer)" ;;
+            *"APPLICATION"*)   echo "  Next: check http_proxy/https_proxy env, CA bundle, IPv6 preference" ;;
+            *) echo "  Next: re-run with --verbose; check references/common-culprits.md" ;;
+        esac
+    fi
+    echo
+    echo "=== END PROBE ==="
+fi

+ 100 - 0
skills/net-ops/scripts/linux/resolved-reset.sh

@@ -0,0 +1,100 @@
+#!/usr/bin/env bash
+# net-ops :: linux/resolved-reset.sh
+# Reset systemd-resolved state when per-link DNS gets stuck (typical after
+# VPN disconnect leaves stale per-link DNS / domain settings).
+#
+# Defaults to DRY RUN — pass --apply to actually act.
+# Requires sudo for the apply path.
+
+set -eu
+
+APPLY=0
+for arg in "$@"; do
+    case "$arg" in
+        --apply) APPLY=1 ;;
+        --help|-h)
+            cat <<EOF
+Usage: $0 [--apply]
+
+Diagnoses and (with --apply) resets systemd-resolved per-link DNS state.
+
+  --apply    Flush caches and revert each link's DNS to NetworkManager/networkd defaults
+             (default: dry-run, prints what would happen)
+EOF
+            exit 0 ;;
+    esac
+done
+
+if ! systemctl is-active systemd-resolved >/dev/null 2>&1; then
+    echo "systemd-resolved is not active. This script only applies when it is."
+    echo "On non-systemd-resolved systems, edit /etc/resolv.conf or NetworkManager config directly."
+    exit 0
+fi
+
+echo "=== BEFORE ==="
+resolvectl status 2>/dev/null | head -60
+
+# Find links with non-empty per-link DNS (potential stale state)
+LINKS_WITH_DNS=$(resolvectl status 2>/dev/null | awk '
+    /^Link [0-9]+ \(/{ split($0,a," \\("); split(a[2],b,")"); link=b[1]; ifn=a[1]; sub("Link ","",ifn); has=0 }
+    /Current DNS Server:|DNS Servers:/{ if(NF>3){print ifn"|"link} }
+' | sort -u)
+
+if [[ -z "$LINKS_WITH_DNS" ]]; then
+    echo
+    echo "No links have explicit DNS set. Nothing to reset."
+    exit 0
+fi
+
+echo
+echo "=== LINKS WITH EXPLICIT DNS ==="
+echo "$LINKS_WITH_DNS" | while IFS='|' read -r idx name; do
+    echo "  Link $idx ($name)"
+done
+
+if [[ "$APPLY" -eq 0 ]]; then
+    echo
+    echo "DRY RUN — pass --apply to actually reset these links and flush caches."
+    exit 0
+fi
+
+if [[ "$EUID" -ne 0 ]]; then
+    echo "Need root. Re-running with sudo..."
+    exec sudo "$0" --apply
+fi
+
+echo
+echo "=== RESETTING ==="
+echo "$LINKS_WITH_DNS" | while IFS='|' read -r idx name; do
+    if resolvectl revert "$name" 2>/dev/null; then
+        echo "[OK]   reverted $name"
+    else
+        echo "[WARN] revert failed for $name (may be a VPN tunnel — manual cleanup may be needed)"
+    fi
+done
+
+echo
+echo "=== FLUSHING CACHE ==="
+resolvectl flush-caches && echo "  cache flushed"
+
+# Restart for good measure if user really wanted a reset
+systemctl restart systemd-resolved
+echo "  systemd-resolved restarted"
+
+echo
+echo "=== VERIFICATION ==="
+if out=$(getent hosts google.com 2>&1) && [[ -n "$out" ]]; then
+    echo "[PASS] getent hosts google.com -> $(echo "$out" | awk '{print $1}')"
+else
+    echo "[FAIL] getent still failing. Check /etc/nsswitch.conf and /etc/resolv.conf."
+fi
+
+if curl -sS -o /dev/null -w "[PASS] HTTPS google.com -> HTTP %{http_code}\n" --max-time 8 https://www.google.com 2>&1; then
+    :
+else
+    echo "[FAIL] HTTPS still broken."
+fi
+
+echo
+echo "=== AFTER ==="
+resolvectl status 2>/dev/null | head -40

+ 110 - 0
skills/net-ops/scripts/macos/dns-audit.sh

@@ -0,0 +1,110 @@
+#!/usr/bin/env bash
+# net-ops :: macos/dns-audit.sh
+# Deep DNS forensics for macOS. Use when probe.sh shows rung 4 (dig) PASS
+# but rung 5 (dscacheutil) FAIL — that signature points at a hook in the
+# macOS resolver chain.
+
+set -u
+
+# shellcheck source=../_lib/redact.sh
+source "$(dirname "$0")/../_lib/redact.sh"
+parse_redact_flag "$@"
+maybe_redact_self "$@"
+
+echo "=== scutil --dns (FULL) ==="
+scutil --dns 2>/dev/null
+
+echo
+echo "=== /etc/resolver/* (per-domain DNS overrides — VPN clients use these) ==="
+if [[ -d /etc/resolver ]] && [[ -n "$(ls -A /etc/resolver 2>/dev/null)" ]]; then
+    for f in /etc/resolver/*; do
+        [[ -f "$f" ]] || continue
+        echo "--- $f ---"
+        echo "  modified: $(stat -f '%Sm' "$f" 2>/dev/null || stat -c '%y' "$f" 2>/dev/null)"
+        cat "$f" | sed 's/^/  /'
+    done
+else
+    echo "/etc/resolver/ empty or missing — no per-domain overrides."
+fi
+
+echo
+echo "=== Configuration profiles with DNS settings ==="
+profiles list -type configuration 2>/dev/null | head -40
+echo
+echo "  (run 'sudo profiles show -type configuration' for full payloads)"
+
+echo
+echo "=== /etc/hosts (non-comment lines) ==="
+grep -vE '^\s*(#|$)' /etc/hosts 2>/dev/null || echo "  (no custom entries)"
+
+echo
+echo "=== /etc/resolv.conf (legacy, usually a stub on macOS) ==="
+if [[ -f /etc/resolv.conf ]]; then
+    cat /etc/resolv.conf
+else
+    echo "  not present"
+fi
+
+echo
+echo "=== mDNSResponder state ==="
+if pgrep -x mDNSResponder >/dev/null; then
+    pid=$(pgrep -x mDNSResponder | head -1)
+    echo "PID: $pid"
+    ps -o pid,etime,command -p "$pid" 2>/dev/null
+fi
+
+echo
+echo "=== Network services priority order ==="
+networksetup -listnetworkserviceorder 2>/dev/null | head -30
+
+echo
+echo "=== DNS servers per active service ==="
+networksetup -listallnetworkservices 2>/dev/null | tail -n +2 | while read -r svc; do
+    [[ "$svc" == \** ]] && continue  # disabled
+    dns=$(networksetup -getdnsservers "$svc" 2>/dev/null)
+    echo "  $svc: $dns"
+done
+
+echo
+echo "=== Search domains per active service ==="
+networksetup -listallnetworkservices 2>/dev/null | tail -n +2 | while read -r svc; do
+    [[ "$svc" == \** ]] && continue
+    sd=$(networksetup -getsearchdomains "$svc" 2>/dev/null)
+    echo "  $svc: $sd"
+done
+
+echo
+echo "=== Third-party network kexts loaded ==="
+kextstat 2>/dev/null | grep -iE 'cisco|anyconnect|proton|mullvad|nord|littlesnitch|lulu|nextdns|warp' || echo "  (none detected)"
+
+echo
+echo "=== ATTRIBUTION HINTS ==="
+# Aggregate every nameserver we can see across all resolver surfaces, then
+# pattern-match each unique entry to a known VPN/DNS client signature.
+ns_list=$( {
+    [[ -d /etc/resolver ]] && grep -h '^nameserver' /etc/resolver/* 2>/dev/null | awk '{print $2}'
+    scutil --dns 2>/dev/null | awk '/nameserver\[[0-9]+\]/{print $3}'
+    networksetup -listallnetworkservices 2>/dev/null | tail -n +2 | while read -r svc; do
+        [[ "$svc" == \** ]] && continue
+        networksetup -getdnsservers "$svc" 2>/dev/null | grep -E '^[0-9a-f:.]+$' || true
+    done
+} | sort -u | grep -v '^$' )
+
+if [[ -z "$ns_list" ]]; then
+    echo "  (no nameservers found)"
+fi
+
+while read -r n; do
+    [[ -z "$n" ]] && continue
+    case "$n" in
+        10.2.0.*)        echo "  $n :: likely Proton VPN gateway" ;;
+        10.64.0.*)       echo "  $n :: likely Mullvad gateway" ;;
+        10.211.*|10.212.*) echo "  $n :: likely Cisco AnyConnect" ;;
+        10.5.0.*)        echo "  $n :: likely NordVPN gateway" ;;
+        100.100.100.100) echo "  $n :: Tailscale MagicDNS (expected)" ;;
+        127.0.0.1|127.0.0.2|::1) echo "  $n :: local DNS proxy (NextDNS, AdGuard, dnsmasq, etc.)" ;;
+        1.1.1.1|1.0.0.1) echo "  $n :: Cloudflare public DNS" ;;
+        8.8.8.8|8.8.4.4) echo "  $n :: Google public DNS" ;;
+        9.9.9.9|149.112.112.112) echo "  $n :: Quad9 public DNS" ;;
+    esac
+done <<< "$ns_list"

+ 405 - 0
skills/net-ops/scripts/macos/probe.sh

@@ -0,0 +1,405 @@
+#!/usr/bin/env bash
+# net-ops :: macos/probe.sh
+# Full layered diagnostic ladder for macOS network troubleshooting.
+# Outputs structured [PASS]/[FAIL] lines so a human or LLM can scan for
+# the first FAIL and drill in.
+
+set -u
+
+TEST_HOST="${TEST_HOST:-google.com}"
+TEST_IPS=("1.1.1.1" "8.8.8.8")
+TIMEOUT="${TIMEOUT:-5}"
+
+VERBOSE=0
+for arg in "$@"; do
+    case "$arg" in
+        --verbose|-v) VERBOSE=1 ;;
+        --help|-h)
+            cat <<EOF
+Usage: $0 [--redact] [--verbose] [--json] [--quick]
+
+  --redact   Mask private IPs, MAC addresses, and *.ts.net tailnet names
+  --verbose  Full scutil --dns dump (default: condensed one-line-per-resolver)
+  --json     Newline-delimited JSON output (for piping to jq, dashboards)
+  --quick    Skip rungs 1-4 and 7 if the last full run cached as healthy
+             (cache: \${TMPDIR}/net-ops/last-state.json, TTL 10min)
+
+Compose freely: --json + --redact emits sanitized NDJSON.
+EOF
+            exit 0 ;;
+    esac
+done
+
+# shellcheck source=../_lib/redact.sh
+source "$(dirname "$0")/../_lib/redact.sh"
+# shellcheck source=../_lib/output.sh
+source "$(dirname "$0")/../_lib/output.sh"
+# shellcheck source=../_lib/cache.sh
+source "$(dirname "$0")/../_lib/cache.sh"
+parse_redact_flag "$@"
+parse_output_flags "$@"
+parse_quick_flag "$@"
+maybe_redact_self "$@"
+
+if cache_indicates_healthy; then
+    info "  [--quick: last full run was healthy, skipping rungs 1-4 and 7]"
+fi
+
+# ---------------------------------------------------------------------------
+if should_run_rung 1; then
+section "1. LINK LAYER"
+# ---------------------------------------------------------------------------
+ACTIVE_IFS=$(networksetup -listallhardwareports 2>/dev/null | awk '/Hardware Port/{port=$3} /Device/{print port" "$2}' || true)
+echo "$ACTIVE_IFS" | while read -r line; do
+    [[ -z "$line" ]] && continue
+    name="${line% *}"; dev="${line##* }"
+    status=$(ifconfig "$dev" 2>/dev/null | awk '/status:/{print $2; exit}')
+    if [[ "$status" == "active" ]]; then
+        ip=$(ifconfig "$dev" 2>/dev/null | awk '/inet /{print $2; exit}')
+        pass "Interface $name ($dev) active" "$ip"
+    fi
+done
+
+GATEWAY=$(route -n get default 2>/dev/null | awk '/gateway:/{print $2}')
+DEFAULT_IF=$(route -n get default 2>/dev/null | awk '/interface:/{print $2}')
+[[ -n "$GATEWAY" ]] && pass "Default gateway" "$GATEWAY via $DEFAULT_IF" || fail "Default gateway" "none configured"
+
+fi  # end rung 1
+
+# ---------------------------------------------------------------------------
+if should_run_rung 2; then
+section "2. IP / ICMP REACHABILITY"
+# ---------------------------------------------------------------------------
+[[ -n "${GATEWAY:-}" ]] && {
+    if ping -c 2 -W "${TIMEOUT}000" "$GATEWAY" >/dev/null 2>&1; then pass "Ping gateway $GATEWAY"; else fail "Ping gateway $GATEWAY"; fi
+}
+for ip in "${TEST_IPS[@]}"; do
+    if ping -c 2 -W "${TIMEOUT}000" "$ip" >/dev/null 2>&1; then pass "Ping $ip"; else fail "Ping $ip"; fi
+done
+
+fi  # end rung 2
+
+# ---------------------------------------------------------------------------
+if should_run_rung 3; then
+section "3. TCP/UDP SOCKET REACHABILITY"
+# ---------------------------------------------------------------------------
+for ip in "${TEST_IPS[@]}"; do
+    if nc -zv -G "$TIMEOUT" "$ip" 443 >/dev/null 2>&1; then pass "TCP/443 -> $ip"; else fail "TCP/443 -> $ip"; fi
+    if nc -zv -G "$TIMEOUT" "$ip" 53 >/dev/null 2>&1; then pass "TCP/53  -> $ip"; else fail "TCP/53  -> $ip"; fi
+done
+
+# Raw UDP/53 — uses dig with explicit server, bypasses /etc/resolv.conf
+for ip in "${TEST_IPS[@]}"; do
+    if dig +short +time="$TIMEOUT" +tries=1 @"$ip" "$TEST_HOST" >/dev/null 2>&1; then
+        result=$(dig +short +time="$TIMEOUT" +tries=1 @"$ip" "$TEST_HOST" | head -1)
+        pass "UDP/53 -> $ip (dig)" "$result"
+    else
+        fail "UDP/53 -> $ip (dig)"
+    fi
+done
+
+fi  # end rung 3
+
+# ---------------------------------------------------------------------------
+if should_run_rung 4; then
+section "4. DNS INFRASTRUCTURE (bypass tools)"
+# ---------------------------------------------------------------------------
+# dig uses its own resolver — does NOT touch macOS DNS resolution chain
+for srv in "" "${TEST_IPS[@]}"; do
+    if [[ -z "$srv" ]]; then
+        out=$(dig +short +time="$TIMEOUT" +tries=1 "$TEST_HOST" 2>&1)
+        label="default"
+    else
+        out=$(dig +short +time="$TIMEOUT" +tries=1 @"$srv" "$TEST_HOST" 2>&1)
+        label="$srv"
+    fi
+    if [[ -n "$out" && ! "$out" =~ "timed out"|"connection refused" ]]; then
+        pass "dig via $label" "$(echo "$out" | head -1)"
+    else
+        fail "dig via $label" "$out"
+    fi
+done
+
+fi  # end rung 4
+
+# ---------------------------------------------------------------------------
+section "5. macOS RESOLVER PATH (the hook layer)"
+# ---------------------------------------------------------------------------
+# dscacheutil uses the macOS resolver chain — goes through everything
+out=$(dscacheutil -q host -a name "$TEST_HOST" 2>&1)
+if echo "$out" | grep -q "ip_address:"; then
+    addr=$(echo "$out" | awk '/ip_address:/{print $2; exit}')
+    pass "dscacheutil (system resolver)" "$addr"
+else
+    fail "dscacheutil (system resolver)" "$(echo "$out" | head -3)"
+fi
+
+# /etc/resolver/* — per-domain overrides, classic VPN residue
+if [[ -d /etc/resolver ]]; then
+    resolver_files=$(ls /etc/resolver/ 2>/dev/null)
+    if [[ -n "$resolver_files" ]]; then
+        echo "  /etc/resolver/ contents (per-domain DNS overrides):"
+        for f in /etc/resolver/*; do
+            [[ -f "$f" ]] || continue
+            domain="${f##*/}"
+            ns=$(awk '/^nameserver/{print $2}' "$f" | tr '\n' ' ')
+            echo "    $domain -> $ns"
+        done
+    fi
+fi
+
+# scutil DNS state — the authoritative view of macOS resolver config
+if [[ "$VERBOSE" -eq 1 ]]; then
+    echo "  scutil --dns (full):"
+    scutil --dns 2>/dev/null | sed 's/^/    /'
+else
+    # Condensed: one line per resolver — scope (via domain or search), nameservers, order
+    echo "  scutil --dns (condensed, --verbose for full):"
+    scutil --dns 2>/dev/null | awk '
+        /^resolver #/{ if(num){flush()} num=$2; sub(/#/,"",num); scope=""; ns=""; ord="" }
+        /search domain\[0\]/{ scope="search="$NF }
+        /domain[[:space:]]*:/{ scope="domain="$NF }
+        /options/{ if($NF~/mdns/) scope="mdns" }
+        /nameserver\[[0-9]+\]/{ ns=ns?ns","$NF:$NF }
+        /order[[:space:]]*:/{ ord=$NF }
+        function flush() {
+            if (!scope) scope="default"
+            print "    #"num"  scope="scope"  via="ns"  order="ord
+        }
+        END{ if(num) flush() }
+    '
+fi
+
+# Configuration profiles (MDM / VPN-installed). Without sudo we only see user-scope.
+profile_count=$(profiles list -type configuration 2>/dev/null | grep -c "attribute:" 2>/dev/null)
+profile_count="${profile_count:-0}"
+if [[ "$profile_count" =~ ^[0-9]+$ ]] && (( profile_count > 0 )); then
+    echo "  Configuration profiles installed (user scope): $profile_count"
+    echo "    For full detail incl. system profiles: sudo profiles list -type configuration"
+fi
+
+# Local DNS proxy detection — derived from scutil (works unprivileged).
+# Common with NextDNS, AdGuard, dnsmasq, Pi-hole client, Cloudflare WARP.
+if scutil --dns 2>/dev/null | awk '/nameserver\[[0-9]+\]/{print $3}' | grep -qE '^(127\.|::1$)'; then
+    echo "  !! Local DNS proxy detected in resolver chain (127.x or ::1 nameserver)"
+    echo "     Apps using the system resolver may route DNS through it."
+    echo "     For PID/process: sudo lsof -nP -iUDP:53"
+fi
+
+# mDNSResponder state
+if pgrep -x mDNSResponder >/dev/null; then
+    pid=$(pgrep -x mDNSResponder | head -1)
+    pass "mDNSResponder running" "PID $pid"
+else
+    fail "mDNSResponder" "not running — system DNS will be broken"
+fi
+
+# ---------------------------------------------------------------------------
+# Time-sync deep-dive: compare local clock to HTTP Date, AND check whether
+# macOS network time sync itself is enabled + which server it's pointing at.
+# Stratum-16 (unsynced) clocks are the silent killer of TLS validation.
+ntp_enabled=$(systemsetup -getusingnetworktime 2>/dev/null | awk -F': ' '{print $2}')
+ntp_server=$(systemsetup -getnetworktimeserver 2>/dev/null | awk -F': ' '{print $2}')
+
+# HTTP Date drift (works without elevated privs, no NTP infra needed)
+remote_date=$(curl -sIA 'net-ops-probe' --max-time 5 https://www.google.com 2>/dev/null | awk -F': ' 'tolower($1)=="date"{print $2; exit}' | tr -d '\r')
+drift_ok=1
+drift_detail=""
+if [[ -n "$remote_date" ]]; then
+    remote_epoch=$(date -j -f '%a, %d %b %Y %H:%M:%S %Z' "$remote_date" +%s 2>/dev/null)
+    if [[ -n "$remote_epoch" ]]; then
+        local_epoch=$(date +%s)
+        drift=$(( local_epoch - remote_epoch ))
+        abs_drift=${drift#-}
+        if [[ "$abs_drift" -lt 300 ]]; then
+            drift_detail="${drift}s vs HTTP Date (within ±5min)"
+        else
+            drift_ok=0
+            drift_detail="${drift}s drift — will break TLS cert validation"
+        fi
+    fi
+fi
+
+# Optional: query the configured NTP server for actual stratum / offset.
+# sntp is built-in on macOS; suppress its noisy output.
+ntp_offset=""
+if [[ -n "$ntp_server" ]] && command -v sntp >/dev/null 2>&1; then
+    ntp_offset=$(sntp -t 3 "$ntp_server" 2>/dev/null | awk '/[+-][0-9]+\.[0-9]+/{print $1; exit}')
+fi
+
+combined="$drift_detail"
+[[ -n "$ntp_enabled" ]] && combined="$combined; NTP sync=$ntp_enabled"
+[[ -n "$ntp_server" ]] && combined="$combined; server=$ntp_server"
+[[ -n "$ntp_offset" ]] && combined="$combined; sntp offset=${ntp_offset}s"
+
+if [[ "$drift_ok" -eq 1 ]] && { [[ "$ntp_enabled" == "On" ]] || [[ -z "$ntp_enabled" ]]; }; then
+    pass "Time sync" "$combined"
+else
+    fail "Time sync" "$combined"
+fi
+
+# MTU / path-MTU discovery test. Standard Ethernet MTU is 1500.
+# We send a 1472-byte payload (1472 + 20 IP + 8 ICMP = 1500) with DF set.
+# If this fails but a smaller size works, there's a path-MTU issue
+# (PPPoE, weird tunnel, broken ICMP "fragmentation needed" delivery).
+if ping -D -s 1472 -c 1 -t 3 1.1.1.1 >/dev/null 2>&1; then
+    pass "Path MTU 1500 (1472-byte DF payload)" "to 1.1.1.1"
+else
+    if ping -D -s 1400 -c 1 -t 3 1.1.1.1 >/dev/null 2>&1; then
+        fail "Path MTU 1500 (1472-byte DF payload)" "1500 fails, 1428+ works — path MTU < 1500 (VPN/PPPoE?)"
+    else
+        # Both fail — DF blocking entirely; don't flag as MTU
+        pass "Path MTU test inconclusive" "ICMP DF blocked or destination unreachable"
+    fi
+fi
+
+# IPv6 deep-dive — classifies v6 stack state across four meaningful tiers
+# instead of a binary works/broken. Each tier maps to a distinct fix path.
+v6_state=""
+v6_detail=""
+
+# 1. Any v6 address on a non-loopback interface?
+v6_addrs=$(ifconfig 2>/dev/null | awk '/^[a-z]/{ifn=$1} /inet6 /{print ifn" "$2}' | grep -v "::1\|fe80::" | grep -v "^utun\|^awdl\|^llw\|^bridge")
+# 2. Any GLOBAL v6 address (not ULA fd00::/8)?
+v6_global=$(printf '%s\n' "$v6_addrs" | awk '$2 !~ /^fd/ && $2 !~ /^fc/{print; exit}')
+# 3. Is there an actual global default route?
+v6_default=$(route -n get -inet6 default 2>&1 | awk '/gateway:/{print $2; exit}')
+[[ "$v6_default" =~ ^fe80 ]] && v6_default=""  # link-local doesn't count
+
+if [[ -z "$v6_addrs" ]]; then
+    v6_state="disabled"
+    v6_detail="no v6 addresses on physical interfaces — IPv6 disabled or unconfigured"
+elif [[ -z "$v6_global" ]]; then
+    v6_state="ula_only"
+    v6_detail="only ULA (fd00::/8) addresses present — ISP/router not delegating public v6 prefix"
+elif [[ -z "$v6_default" ]]; then
+    v6_state="no_route"
+    v6_detail="global v6 address present but no default route — RA not received or NDP broken"
+else
+    # We have a v6 address and a route — test actual connectivity
+    aaaa=$(dig +short +time=2 +tries=1 AAAA "$TEST_HOST" 2>/dev/null | head -1)
+    if [[ -n "$aaaa" ]] && curl -6 -sS -o /dev/null --max-time 4 "https://$TEST_HOST" 2>/dev/null; then
+        v6_state="healthy"
+        v6_detail="global addr + default route + curl -6 works"
+    else
+        v6_state="path_broken"
+        v6_detail="addr=$v6_global, route via $v6_default, but curl -6 fails — upstream v6 path dead"
+    fi
+fi
+
+case "$v6_state" in
+    disabled|healthy)
+        pass "IPv6 stack ($v6_state)" "$v6_detail" ;;
+    ula_only)
+        fail "IPv6 stack ($v6_state)" "$v6_detail — apps may try v6 first, hit 'no route', fall back to v4 (slow). Fix: sudo networksetup -setv6off <service>" ;;
+    no_route)
+        fail "IPv6 stack ($v6_state)" "$v6_detail — check ndp -an for RA receipt; restart interface or check router RA config" ;;
+    path_broken)
+        fail "IPv6 stack ($v6_state)" "$v6_detail — VPN/firewall blocking v6, or ISP black-holing v6 traffic" ;;
+esac
+
+# ---------------------------------------------------------------------------
+section "6. APPLICATION LAYER (real HTTP request)"
+# ---------------------------------------------------------------------------
+for url in "https://www.google.com" "https://github.com"; do
+    if out=$(curl -sS -o /dev/null -w "%{http_code} %{size_download}b" --max-time "$TIMEOUT" "$url" 2>&1); then
+        pass "GET $url" "$out"
+    else
+        fail "GET $url" "$out"
+    fi
+done
+
+# ---------------------------------------------------------------------------
+if should_run_rung 7; then
+section "7. KNOWN VPN / DNS CLIENT FOOTPRINT"
+# ---------------------------------------------------------------------------
+KNOWN_PATHS=(
+    "/Applications/Proton VPN.app"
+    "/Applications/Mullvad VPN.app"
+    "/Applications/Tailscale.app"
+    "/Applications/Cisco/Cisco Secure Client.app"
+    "/Applications/Cisco/Cisco AnyConnect Secure Mobility Client.app"
+    "/Applications/NordVPN.app"
+    "/Applications/NextDNS.app"
+    "/Applications/Little Snitch.app"
+    "/Applications/Lulu.app"
+    "/Library/Application Support/NextDNS"
+)
+for p in "${KNOWN_PATHS[@]}"; do
+    [[ -e "$p" ]] && echo "  Installed: $p"
+done
+
+# Browser DoH state — Chrome / Brave / Edge / Firefox have their own resolvers
+# that bypass system DNS entirely when DoH is configured. Useful for explaining
+# "Chrome works but Safari doesn't" type asymmetries.
+browser_findings=""
+chrome_prefs="$HOME/Library/Application Support/Google/Chrome/Default/Preferences"
+brave_prefs="$HOME/Library/Application Support/BraveSoftware/Brave-Browser/Default/Preferences"
+edge_prefs="$HOME/Library/Application Support/Microsoft Edge/Default/Preferences"
+for label_prefs in "Chrome:$chrome_prefs" "Brave:$brave_prefs" "Edge:$edge_prefs"; do
+    label="${label_prefs%%:*}"
+    prefs="${label_prefs#*:}"
+    if [[ -f "$prefs" ]]; then
+        # Chromium stores DoH mode under dns_over_https.mode: "off" | "automatic" | "secure"
+        mode=$(perl -ne 'if (/"dns_over_https"\s*:\s*\{[^}]*"mode"\s*:\s*"([^"]+)"/) { print "$1\n"; exit }' "$prefs" 2>/dev/null)
+        templates=$(perl -ne 'if (/"dns_over_https"\s*:\s*\{[^}]*"templates"\s*:\s*"([^"]+)"/) { print "$1\n"; exit }' "$prefs" 2>/dev/null)
+        if [[ -n "$mode" ]]; then
+            browser_findings+="    $label DoH: mode=$mode${templates:+, server=$templates}\n"
+        else
+            browser_findings+="    $label installed, DoH: not configured (system DNS)\n"
+        fi
+    fi
+done
+# Firefox: per-profile prefs.js, network.trr.mode (0=off, 2=enabled w/fallback, 3=enabled only, 5=disabled)
+for fx_prefs in "$HOME/Library/Application Support/Firefox/Profiles"/*.default*/prefs.js; do
+    [[ -f "$fx_prefs" ]] || continue
+    trr_mode=$(awk -F'"' '/"network.trr.mode"/{print $4; exit}' "$fx_prefs" 2>/dev/null)
+    trr_uri=$(awk -F'"' '/"network.trr.uri"/{print $4; exit}' "$fx_prefs" 2>/dev/null)
+    case "${trr_mode:-0}" in
+        2) state="enabled (with system fallback)" ;;
+        3) state="enabled (no fallback)" ;;
+        5) state="disabled by policy" ;;
+        *) state="off (system DNS)" ;;
+    esac
+    browser_findings+="    Firefox DoH: $state${trr_uri:+, server=$trr_uri}\n"
+    break  # only check one profile
+done
+if [[ -n "$browser_findings" ]]; then
+    info "  Browser DoH state (browsers may bypass system DNS):"
+    printf '%b' "$browser_findings"
+fi
+
+# Network services often reveal VPN/DNS clients that don't install at /Applications
+# (e.g. CLI-only NextDNS, kernel/system extensions, virtual interfaces)
+ns_pattern='Proton|Mullvad|NextDNS|Cisco|NordVPN|Tailscale|WireGuard|OpenVPN|Cloudflare|WARP|AdGuard'
+ns_found=$(networksetup -listallnetworkservices 2>/dev/null | grep -iE "$ns_pattern" || true)
+if [[ -n "$ns_found" ]]; then
+    echo "  Network services:"
+    echo "$ns_found" | sed 's/^/    /'
+fi
+
+fi  # end rung 7
+
+# Persist state for future --quick runs (only when we ran the FULL ladder).
+if [[ "$QUICK_MODE" -eq 0 ]]; then
+    cache_save_state "$PASS_COUNT" "$FAIL_COUNT" "$FIRST_FAIL"
+fi
+
+emit_summary
+if [[ "$JSON_MODE" -eq 0 ]]; then
+    if [[ -n "$FIRST_FAIL" ]]; then
+        case "$FIRST_FAIL" in
+            *"LINK LAYER"*)    echo "  Next: check ifconfig / networksetup, fix interface / DHCP" ;;
+            *"SOCKET"*)        echo "  Next: check Little Snitch / Lulu / pfctl rules; AV protocol filtering; consumer router DoH IP blocking" ;;
+            *"ICMP"*|*"IP /"*) echo "  Next: check route table, ISP/upstream connectivity" ;;
+            *"DNS INFRASTRUCTURE"*) echo "  Next: check UDP/53 outbound, router DNS forwarder" ;;
+            *"RESOLVER PATH"*) echo "  Next: bash scripts/macos/dns-audit.sh   # drill rung 5 (the hook layer)" ;;
+            *"APPLICATION"*)   echo "  Next: check proxy (scutil --proxy), keychain certs, IPv6 preference" ;;
+            *) echo "  Next: re-run with --verbose; check references/common-culprits.md" ;;
+        esac
+    else
+        echo "  (No failures. If user still reports issues, see rung 7 footprint and time-based notes in references/diagnostic-ladder.md.)"
+    fi
+    echo
+    echo "=== END PROBE ==="
+fi

+ 120 - 0
skills/net-ops/scripts/macos/resolver-clean.sh

@@ -0,0 +1,120 @@
+#!/usr/bin/env bash
+# net-ops :: macos/resolver-clean.sh
+# Safely remove orphaned /etc/resolver/* files left behind by disconnected VPNs.
+# NEVER removes Tailscale or current-VPN-tunnel entries.
+#
+# Defaults to DRY RUN — pass --apply to actually delete.
+# Requires sudo.
+
+set -eu
+
+APPLY=0
+PROTECT_PATTERNS="${PROTECT_PATTERNS:-100\.100\.100\.100}"
+
+for arg in "$@"; do
+    case "$arg" in
+        --apply) APPLY=1 ;;
+        --protect=*) PROTECT_PATTERNS="${arg#--protect=}" ;;
+        --help|-h)
+            cat <<EOF
+Usage: $0 [--apply] [--protect=REGEX]
+
+  --apply              Actually delete (default: dry-run only)
+  --protect=REGEX      Nameserver pattern to protect (default: Tailscale's 100.100.100.100)
+
+Examples:
+  $0                                  # show what would be removed
+  $0 --apply                          # remove orphan resolvers, protecting Tailscale
+  $0 --apply --protect='100\\.\\.|192\\.168\\.1\\.'  # also protect 192.168.1.x
+EOF
+            exit 0 ;;
+    esac
+done
+
+if [[ ! -d /etc/resolver ]] || [[ -z "$(ls -A /etc/resolver 2>/dev/null)" ]]; then
+    echo "/etc/resolver/ is empty. Nothing to do."
+    exit 0
+fi
+
+echo "=== BEFORE ==="
+for f in /etc/resolver/*; do
+    [[ -f "$f" ]] || continue
+    ns=$(awk '/^nameserver/{print $2}' "$f" | tr '\n' ',')
+    echo "  $f -> ${ns%,}"
+done
+
+TARGETS=()
+for f in /etc/resolver/*; do
+    [[ -f "$f" ]] || continue
+    if awk '/^nameserver/{print $2}' "$f" | grep -qE "$PROTECT_PATTERNS"; then
+        continue
+    fi
+    TARGETS+=("$f")
+done
+
+if [[ "${#TARGETS[@]}" -eq 0 ]]; then
+    echo
+    echo "No orphan resolver files (all match protected nameserver pattern). Nothing to clean."
+    exit 0
+fi
+
+echo
+echo "=== TARGETS FOR REMOVAL ==="
+for f in "${TARGETS[@]}"; do
+    echo "  $f"
+done
+
+if [[ "$APPLY" -eq 0 ]]; then
+    echo
+    echo "DRY RUN — pass --apply to actually remove the files above."
+    exit 0
+fi
+
+# Apply
+if [[ "$EUID" -ne 0 ]]; then
+    echo "Need root. Re-running with sudo..."
+    exec sudo "$0" --apply --protect="$PROTECT_PATTERNS"
+fi
+
+echo
+echo "=== REMOVING ==="
+for f in "${TARGETS[@]}"; do
+    if rm -f "$f"; then
+        echo "[OK]   $f"
+    else
+        echo "[FAIL] $f"
+    fi
+done
+
+echo
+echo "=== FLUSHING DNS CACHE ==="
+dscacheutil -flushcache
+killall -HUP mDNSResponder 2>/dev/null || true
+echo "  done."
+
+echo
+echo "=== VERIFICATION ==="
+if out=$(dscacheutil -q host -a name google.com 2>&1) && echo "$out" | grep -q "ip_address:"; then
+    addr=$(echo "$out" | awk '/ip_address:/{print $2; exit}')
+    echo "[PASS] dscacheutil google.com -> $addr"
+else
+    echo "[FAIL] dscacheutil still broken. Drill into scutil --dns and configuration profiles."
+fi
+
+if curl -sS -o /dev/null -w "[PASS] HTTPS google.com -> HTTP %{http_code}\n" --max-time 8 https://www.google.com 2>&1; then
+    :
+else
+    echo "[FAIL] HTTPS still broken."
+fi
+
+echo
+echo "=== AFTER ==="
+if [[ -n "$(ls -A /etc/resolver 2>/dev/null)" ]]; then
+    for f in /etc/resolver/*; do
+        [[ -f "$f" ]] || continue
+        ns=$(awk '/^nameserver/{print $2}' "$f" | tr '\n' ',')
+        echo "  $f -> ${ns%,}"
+    done
+else
+    echo "  /etc/resolver/ is now empty."
+fi

+ 64 - 0
skills/net-ops/scripts/probe

@@ -0,0 +1,64 @@
+#!/usr/bin/env bash
+# net-ops :: probe (dispatcher)
+# Detects the local OS and runs the matching per-OS probe with the same args.
+# Supports --watch=N for continuous monitoring (prints state on change only).
+# For Windows targets you reach over SSH, invoke scripts/windows/probe.ps1
+# directly via PowerShell (see scripts/ssh-bootstrap.sh for the pattern).
+
+set -eu
+
+here="$(cd "$(dirname "$0")" && pwd)"
+
+# Parse --watch (with optional =N seconds; default 30)
+watch_interval=0
+clean_args=()
+for arg in "$@"; do
+    case "$arg" in
+        --watch)        watch_interval=30 ;;
+        --watch=*)      watch_interval="${arg#--watch=}" ;;
+        *)              clean_args+=("$arg") ;;
+    esac
+done
+
+# Resolve the per-OS probe script
+case "$(uname -s 2>/dev/null)" in
+    Darwin) target="$here/macos/probe.sh" ;;
+    Linux)  target="$here/linux/probe.sh" ;;
+    CYGWIN*|MINGW*|MSYS*)
+        cat >&2 <<EOF
+Detected Windows shell environment ($(uname -s)).
+Run the PowerShell probe directly:
+  powershell -NoProfile -File "$here/windows/probe.ps1" $*
+EOF
+        exit 2 ;;
+    *)
+        echo "net-ops: unsupported OS '$(uname -s 2>/dev/null)'" >&2
+        exit 2 ;;
+esac
+
+# Continuous watch mode: re-run every N seconds, but only print on state change.
+# A "state" is the (pass_count, fail_count, first_fail) tuple.
+if [[ "$watch_interval" -gt 0 ]]; then
+    echo "Watching every ${watch_interval}s — prints on state change only. Ctrl-C to stop." >&2
+    last_state=""
+    while true; do
+        # Run in JSON mode to extract summary cleanly; suppress streaming output
+        json=$(bash "$target" --json ${clean_args[@]+"${clean_args[@]}"} 2>/dev/null | grep '^{"type":"summary"' | head -1)
+        state=$(printf '%s' "$json" | tr -d ' ')
+        ts=$(date '+%Y-%m-%d %H:%M:%S')
+        if [[ "$state" != "$last_state" ]]; then
+            if [[ -z "$last_state" ]]; then
+                printf '[%s] initial state: %s\n' "$ts" "$json"
+            else
+                printf '[%s] CHANGED: %s\n' "$ts" "$json"
+            fi
+            last_state="$state"
+        else
+            printf '[%s] (no change)\n' "$ts"
+        fi
+        sleep "$watch_interval"
+    done
+fi
+
+# Default: one-shot execution
+exec "$target" ${clean_args[@]+"${clean_args[@]}"}

+ 107 - 0
skills/net-ops/scripts/reverse-probe.sh

@@ -0,0 +1,107 @@
+#!/usr/bin/env bash
+# net-ops :: reverse-probe.sh
+# Diagnose a TARGET host from OUTSIDE — useful when the local probe on the
+# target says "all good" but external services / users still report problems.
+# Runs from this machine against a target host you can reach (LAN, tailnet,
+# public IP, etc).
+#
+# Usage:
+#   scripts/reverse-probe.sh <host>           # use default ports/checks
+#   scripts/reverse-probe.sh <host> [port...] # add custom TCP ports to probe
+#
+# Examples:
+#   scripts/reverse-probe.sh example.local
+#   scripts/reverse-probe.sh 100.84.X.X 8080 5432
+#   scripts/reverse-probe.sh api.mycompany.com 443
+
+set -u
+
+TARGET="${1:-}"
+if [[ -z "$TARGET" ]]; then
+    echo "Usage: $0 <host> [extra_tcp_port ...]" >&2
+    exit 1
+fi
+shift
+EXTRA_PORTS=("$@")
+DEFAULT_PORTS=(22 80 443)
+TIMEOUT=4
+
+# shellcheck source=_lib/redact.sh
+source "$(dirname "$0")/_lib/redact.sh"
+# shellcheck source=_lib/output.sh
+source "$(dirname "$0")/_lib/output.sh"
+parse_redact_flag "$@"
+parse_output_flags "$@"
+maybe_redact_self "$TARGET" "$@"
+
+# Resolve target — separates DNS issues from reachability issues
+section "1. NAME RESOLUTION FROM HERE"
+if [[ "$TARGET" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
+    pass "Target is literal IP" "$TARGET"
+    TARGET_IP="$TARGET"
+else
+    resolved=$(dig +short +time=3 +tries=1 "$TARGET" 2>/dev/null | head -1)
+    if [[ -n "$resolved" ]]; then
+        pass "Resolved $TARGET (dig, bypass resolver)" "$resolved"
+        TARGET_IP="$resolved"
+    else
+        fail "Resolved $TARGET" "no answer from local DNS — can't proceed past name layer"
+        emit_summary
+        exit 1
+    fi
+fi
+
+section "2. ICMP REACHABILITY"
+if ping -c 2 -W $((TIMEOUT * 1000)) "$TARGET_IP" >/dev/null 2>&1; then
+    pass "Ping $TARGET_IP"
+else
+    fail "Ping $TARGET_IP" "no ICMP response (or ICMP filtered)"
+fi
+
+section "3. TCP PORT REACHABILITY"
+# De-duplicate ports — extras may overlap defaults
+all_ports=$(printf '%s\n' "${DEFAULT_PORTS[@]}" ${EXTRA_PORTS[@]+"${EXTRA_PORTS[@]}"} | awk '!seen[$0]++')
+while read -r port; do
+    [[ -z "$port" ]] && continue
+    if nc -zv -G "$TIMEOUT" "$TARGET_IP" "$port" >/dev/null 2>&1; then
+        pass "TCP/$port -> $TARGET_IP" "open"
+    else
+        fail "TCP/$port -> $TARGET_IP" "closed or filtered"
+    fi
+done <<< "$all_ports"
+
+section "4. TLS / HTTPS HEALTH (if 443 open)"
+if nc -zv -G "$TIMEOUT" "$TARGET_IP" 443 >/dev/null 2>&1; then
+    if [[ "$TARGET" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
+        # Connect by IP; cert check will fail SNI but we can still probe
+        out=$(curl -sS -o /dev/null -w "%{http_code}|%{time_total}" --max-time "$TIMEOUT" -k "https://$TARGET_IP" 2>&1)
+        pass "HTTPS to IP (cert SNI may not match)" "$out"
+    else
+        out=$(curl -sS -o /dev/null -w "%{http_code}|%{time_total}" --max-time "$TIMEOUT" "https://$TARGET" 2>&1)
+        if [[ "$out" =~ ^[0-9]+\|[0-9.]+$ ]]; then
+            pass "HTTPS to $TARGET" "$out"
+        else
+            fail "HTTPS to $TARGET" "$out"
+        fi
+    fi
+fi
+
+section "5. PATH / ROUTING"
+case "$(uname -s)" in
+    Darwin)
+        # macOS traceroute: -w timeout (sec), -m max hops, -q probes per hop
+        info "  traceroute (first 8 hops):"
+        traceroute -n -w 2 -q 1 -m 8 "$TARGET_IP" 2>/dev/null | head -10 | sed 's/^/    /' || true
+        ;;
+    Linux)
+        if command -v traceroute >/dev/null 2>&1; then
+            info "  traceroute (first 8 hops):"
+            traceroute -n -w 2 -q 1 -m 8 "$TARGET_IP" 2>/dev/null | head -10 | sed 's/^/    /' || true
+        elif command -v mtr >/dev/null 2>&1; then
+            info "  mtr report (5 cycles):"
+            mtr -nrc 5 "$TARGET_IP" 2>/dev/null | tail -10 | sed 's/^/    /' || true
+        fi
+        ;;
+esac
+
+emit_summary

+ 124 - 0
skills/net-ops/scripts/ssh-bootstrap.sh

@@ -0,0 +1,124 @@
+#!/usr/bin/env bash
+# net-ops :: ssh-bootstrap.sh
+# Establish an SSH session to any target (Windows / macOS / Linux) using
+# password auth via sshpass. Reads password from stdin so it never appears
+# in argv / shell history. Auto-detects target OS and emits the right
+# invocation pattern for follow-up commands.
+#
+# Usage:
+#   echo 'password' | scripts/ssh-bootstrap.sh user@host
+#   scripts/ssh-bootstrap.sh user@host    # interactive prompt
+
+set -euo pipefail
+
+TARGET="${1:-}"
+if [[ -z "$TARGET" ]]; then
+    echo "Usage: $0 user@host" >&2
+    exit 1
+fi
+
+if ! command -v sshpass >/dev/null 2>&1; then
+    echo "sshpass not found. Install:" >&2
+    echo "  macOS:   brew install hudochenkov/sshpass/sshpass" >&2
+    echo "  Linux:   apt install sshpass / dnf install sshpass" >&2
+    exit 1
+fi
+
+# Read password — from stdin if piped, else prompt
+if [[ -t 0 ]]; then
+    read -rsp "Password for $TARGET: " PASSWORD
+    echo
+else
+    read -r PASSWORD
+fi
+export SSHPASS="$PASSWORD"
+
+# Quick connectivity check (also accepts host key on first contact).
+# Use a probe that works on all three: `uname -s` on Unix, fails on cmd.exe
+# but succeeds on Windows OpenSSH default shell when it's pwsh/powershell.
+echo "Probing $TARGET ..."
+PROBE=$(sshpass -e ssh \
+    -o StrictHostKeyChecking=accept-new \
+    -o ConnectTimeout=10 \
+    "$TARGET" \
+    'uname -s 2>/dev/null || cmd /c ver 2>nul || ver' 2>&1 | tr -d '\r')
+
+echo "  Response: $(echo "$PROBE" | head -3 | tr '\n' ' | ')"
+
+# Detect OS family from probe output
+OS=""
+case "$PROBE" in
+    *Darwin*)         OS="macos" ;;
+    *Linux*)          OS="linux" ;;
+    *Microsoft*|*Windows*) OS="windows" ;;
+esac
+
+if [[ -z "$OS" ]]; then
+    echo
+    echo "Could not auto-detect OS. Treating as unknown — defaulting to bash transport."
+    OS="unknown"
+fi
+
+echo "Detected OS family: $OS"
+
+# Per-OS smoke test
+case "$OS" in
+    windows)
+        echo
+        echo "Testing PowerShell -EncodedCommand transport ..."
+        TEST_PS='Write-Output ("PS ready :: " + $PSVersionTable.PSVersion.ToString())'
+        B64=$(printf '%s' "$TEST_PS" | iconv -t UTF-16LE | base64)
+        sshpass -e ssh "$TARGET" "powershell -NoProfile -EncodedCommand $B64" 2>&1 | tail -3
+        ;;
+    macos|linux|unknown)
+        echo
+        echo "Testing bash transport ..."
+        sshpass -e ssh "$TARGET" 'bash -c "echo BASH_OK :: \$(bash --version | head -1)"' 2>&1 | tail -2
+        ;;
+esac
+
+# Per-OS invocation hints
+echo
+echo "---"
+case "$OS" in
+    windows)
+        cat <<EOF
+Ready (Windows target). Run a PowerShell script via:
+
+  PS_SCRIPT=\$(cat skills/net-ops/scripts/windows/probe.ps1)
+  B64=\$(printf '%s' "\$PS_SCRIPT" | iconv -t UTF-16LE | base64)
+  SSHPASS='<password>' sshpass -e ssh $TARGET "powershell -NoProfile -EncodedCommand \$B64"
+
+Drilldown scripts: nrpt-audit.ps1, nrpt-clean.ps1
+
+For zero-friction follow-up, install your pubkey on the target:
+  Windows admin path: %ProgramData%\\ssh\\administrators_authorized_keys
+  Windows user path:  %USERPROFILE%\\.ssh\\authorized_keys
+EOF
+        ;;
+    macos)
+        cat <<EOF
+Ready (macOS target). Run a bash script via:
+
+  SSHPASS='<password>' sshpass -e ssh $TARGET 'bash -s' < skills/net-ops/scripts/macos/probe.sh
+
+Drilldown scripts: macos/dns-audit.sh, macos/resolver-clean.sh
+
+Persistent access: ssh-copy-id $TARGET
+EOF
+        ;;
+    linux)
+        cat <<EOF
+Ready (Linux target). Run a bash script via:
+
+  SSHPASS='<password>' sshpass -e ssh $TARGET 'bash -s' < skills/net-ops/scripts/linux/probe.sh
+
+Drilldown scripts: linux/dns-audit.sh, linux/resolved-reset.sh
+
+Persistent access: ssh-copy-id $TARGET
+EOF
+        ;;
+    *)
+        echo "Generic SSH ready. Run commands directly."
+        ;;
+esac

+ 67 - 0
skills/net-ops/scripts/windows/nrpt-audit.ps1

@@ -0,0 +1,67 @@
+# winnet-ops :: nrpt-audit.ps1
+# Dump every NRPT rule with full forensics. Use when probe.ps1 shows
+# layer 4 (nslookup) PASS but layer 5 (Resolve-DnsName) FAIL — that's the
+# textbook signature of a rogue NRPT entry.
+
+Write-Output "=== NRPT RULES (PowerShell view) ==="
+$rules = Get-DnsClientNrptRule
+if (!$rules) {
+    Write-Output "No NRPT rules configured."
+    return
+}
+$rules | Format-Table Name, Namespace, NameServers, Comment -AutoSize -Wrap | Out-String | Write-Output
+
+Write-Output ""
+Write-Output "=== SUSPICIOUS RULES (catch-all or non-Tailscale) ==="
+$bad = $rules | Where-Object {
+    $_.Namespace -eq "." -or
+    ($_.NameServers -and ($_.NameServers | Where-Object { $_ -notmatch "^100\.100\.100\.100$|^fd7a:115c:a1e0::53$" }))
+}
+if ($bad) {
+    $bad | Format-List Name, Namespace, NameServers, Comment, DohTemplate
+} else {
+    Write-Output "None — only Tailscale MagicDNS rules present (normal)."
+}
+
+Write-Output ""
+Write-Output "=== REGISTRY FORENSICS (creation source, timestamps) ==="
+# NRPT lives in two possible locations:
+#   1. HKLM:\SOFTWARE\Policies\Microsoft\Windows NT\DNSClient\DnsPolicyConfig (Group Policy)
+#   2. HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters\DnsPolicyConfig (local, set by apps)
+$paths = @(
+    @{Path="HKLM:\SOFTWARE\Policies\Microsoft\Windows NT\DNSClient\DnsPolicyConfig"; Source="Group Policy"},
+    @{Path="HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters\DnsPolicyConfig"; Source="Local (app-set)"}
+)
+foreach ($p in $paths) {
+    if (!(Test-Path $p.Path)) { continue }
+    Write-Output ("--- " + $p.Source + " :: " + $p.Path + " ---")
+    Get-ChildItem $p.Path -ErrorAction SilentlyContinue | ForEach-Object {
+        $values = Get-ItemProperty $_.PSPath
+        # LastWriteTime via reg.exe (.NET doesn't expose it for subkeys in older PS)
+        $regOut = reg query $($_.Name -replace "HKEY_LOCAL_MACHINE","HKLM") 2>&1
+        Write-Output ("Rule:    " + $_.PSChildName)
+        Write-Output ("  Comment:   " + $values.Comment)
+        Write-Output ("  DNSServers:" + ($values.GenericDNSServers -join "; "))
+        Write-Output ("  Namespaces:" + (($values.Name | Select-Object -First 3) -join "; ") + $(if ($values.Name.Count -gt 3) { " ... (+" + ($values.Name.Count - 3) + " more)" } else { "" }))
+        Write-Output ""
+    }
+}
+
+Write-Output "=== ATTRIBUTION HINTS ==="
+# Match comments / DNS IPs to known VPN clients
+$rules | ForEach-Object {
+    $hint = switch -Regex ($_.Comment + " " + ($_.NameServers -join " ")) {
+        "Proton"          { "Proton VPN" }
+        "Mullvad"         { "Mullvad" }
+        "AnyConnect|Cisco" { "Cisco AnyConnect" }
+        "Nord"            { "NordVPN" }
+        "DirectAccess"    { "Windows DirectAccess (corporate)" }
+        "10\.2\.0\."      { "Proton VPN (default DNS gateway)" }
+        "10\.64\.0\."     { "Mullvad (default DNS gateway)" }
+        "100\.100\.100\.100" { "Tailscale MagicDNS (expected)" }
+        default { "" }
+    }
+    if ($hint) {
+        Write-Output ("  " + $_.Name + " :: likely " + $hint)
+    }
+}

+ 82 - 0
skills/net-ops/scripts/windows/nrpt-clean.ps1

@@ -0,0 +1,82 @@
+# winnet-ops :: nrpt-clean.ps1
+# Safely remove orphaned NRPT catch-all rules left behind by disconnected VPNs.
+# NEVER removes Tailscale MagicDNS rules.
+#
+# Defaults to DRY RUN — pass -Apply to actually delete.
+# Requires elevated PowerShell (Administrator).
+
+param(
+    [switch]$Apply,
+    [string[]]$ProtectNameServers = @("100.100.100.100","fd7a:115c:a1e0::53")
+)
+
+# Admin check
+$isAdmin = ([Security.Principal.WindowsPrincipal][Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)
+if (!$isAdmin) {
+    Write-Error "Must run as Administrator. Open elevated PowerShell."
+    exit 1
+}
+
+Write-Output "=== BEFORE ==="
+$all = Get-DnsClientNrptRule
+$all | Format-Table Name, Namespace, NameServers, Comment -AutoSize -Wrap | Out-String | Write-Output
+
+# Find catch-all rules NOT pointing at protected (Tailscale) servers
+$targets = $all | Where-Object {
+    $_.Namespace -eq "." -and
+    (-not ($_.NameServers | Where-Object { $ProtectNameServers -contains $_ }))
+}
+
+if (!$targets) {
+    Write-Output ""
+    Write-Output "No orphaned catch-all rules found. Nothing to clean."
+    exit 0
+}
+
+Write-Output ""
+Write-Output "=== TARGETS FOR REMOVAL ==="
+$targets | Format-List Name, Namespace, NameServers, Comment
+
+if (!$Apply) {
+    Write-Output ""
+    Write-Output "DRY RUN — pass -Apply to actually remove the rules above."
+    exit 0
+}
+
+Write-Output ""
+Write-Output "=== REMOVING ==="
+foreach ($t in $targets) {
+    try {
+        Remove-DnsClientNrptRule -Name $t.Name -Force -ErrorAction Stop
+        Write-Output ("[OK] Removed " + $t.Name)
+    } catch {
+        Write-Output ("[FAIL] " + $t.Name + " :: " + $_.Exception.Message)
+    }
+}
+
+Write-Output ""
+Write-Output "=== FLUSHING DNS CACHE ==="
+Clear-DnsClientCache
+ipconfig /flushdns | Select-String "Successfully|Could" | Select-Object -First 1
+
+Write-Output ""
+Write-Output "=== VERIFICATION ==="
+try {
+    $r = Resolve-DnsName google.com -Type A -QuickTimeout -ErrorAction Stop
+    $ips = ($r | Where-Object { $_.Type -eq "A" } | Select-Object -ExpandProperty IPAddress) -join ", "
+    Write-Output ("[PASS] Resolve-DnsName google.com -> " + $ips)
+} catch {
+    Write-Output ("[FAIL] Resolve-DnsName still broken: " + $_.Exception.Message)
+    Write-Output "       Drill into WFP filters / hosts file / DNS Client service."
+}
+
+try {
+    $r = Invoke-WebRequest -Uri "https://www.google.com" -TimeoutSec 8 -UseBasicParsing
+    Write-Output ("[PASS] HTTPS google.com -> HTTP " + $r.StatusCode)
+} catch {
+    Write-Output ("[FAIL] HTTPS still broken: " + $_.Exception.Message)
+}
+
+Write-Output ""
+Write-Output "=== AFTER ==="
+Get-DnsClientNrptRule | Format-Table Name, Namespace, NameServers, Comment -AutoSize -Wrap | Out-String | Write-Output

+ 230 - 0
skills/net-ops/scripts/windows/probe.ps1

@@ -0,0 +1,230 @@
+# net-ops :: windows/probe.ps1
+# Full layered diagnostic ladder for Windows network troubleshooting.
+# Designed to be invoked over SSH via -EncodedCommand. Outputs structured
+# sections so a human or LLM can scan for the first FAIL and drill in.
+
+param(
+    [string]$TestHost = "google.com",
+    [string[]]$TestIPs = @("1.1.1.1","8.8.8.8"),
+    [int]$Timeout = 5,
+    [switch]$Redact,
+    [switch]$JsonOutput
+)
+
+# If -Redact, self-reinvoke without the switch and pipe output through a
+# regex-driven redactor. Preserves Tailscale's well-known 100.100.100.100
+# anchor and public DoH IPs as diagnostic landmarks.
+if ($Redact) {
+    $cleanArgs = @(
+        '-TestHost', $TestHost,
+        '-TestIPs', ($TestIPs -join ',')
+        '-Timeout', $Timeout
+    )
+    if ($JsonOutput) { $cleanArgs += '-JsonOutput' }
+    & powershell -NoProfile -File $PSCommandPath @cleanArgs |
+        ForEach-Object {
+            $line = $_
+            $line = $line -replace '100\.100\.100\.100','__TS_MAGIC__'
+            $line = $line -replace '\b10\.\d{1,3}\.\d{1,3}\.\d{1,3}\b','10.X.X.X'
+            $line = $line -replace '\b172\.(1[6-9]|2[0-9]|3[01])\.\d{1,3}\.\d{1,3}\b','172.X.X.X'
+            $line = $line -replace '\b192\.168\.\d{1,3}\.\d{1,3}\b','192.168.X.X'
+            $line = $line -replace '\b100\.(6[4-9]|[7-9]\d|1[01]\d|12[0-7])\.\d{1,3}\.\d{1,3}\b','100.X.X.X'
+            $line = $line -replace '\b169\.254\.\d{1,3}\.\d{1,3}\b','169.254.X.X'
+            $line = $line -replace '\b[0-9a-fA-F]{2}([:-])[0-9a-fA-F]{2}\1[0-9a-fA-F]{2}\1[0-9a-fA-F]{2}\1[0-9a-fA-F]{2}\1[0-9a-fA-F]{2}\b','XX:XX:XX:XX:XX:XX'
+            $line = $line -replace '\b[a-z0-9-]+\.ts\.net\b','REDACTED.ts.net'
+            $line = $line -replace '__TS_MAGIC__','100.100.100.100'
+            Write-Output $line
+        }
+    exit $LASTEXITCODE
+}
+
+$script:PASS_COUNT = 0
+$script:FAIL_COUNT = 0
+$script:FIRST_FAIL = ""
+$script:CURRENT_SECTION = ""
+
+function Section($name) {
+    $script:CURRENT_SECTION = $name
+    Write-Output ""
+    Write-Output ("=== " + $name + " ===")
+}
+function Result($label, $ok, $detail = "") {
+    if ($ok) {
+        $script:PASS_COUNT++
+        $tag = "PASS"
+    } else {
+        $script:FAIL_COUNT++
+        if (-not $script:FIRST_FAIL) {
+            $script:FIRST_FAIL = "[" + $script:CURRENT_SECTION + "] " + $label
+        }
+        $tag = "FAIL"
+    }
+    Write-Output ("[" + $tag + "] " + $label + $(if ($detail) { " :: " + $detail } else { "" }))
+}
+
+# ---------------------------------------------------------------------------
+Section "1. LINK LAYER"
+# ---------------------------------------------------------------------------
+$adapters = Get-NetAdapter | Where-Object { $_.Status -eq "Up" }
+if (!$adapters) {
+    Result "Any interface up" $false "No interfaces in Up state"
+} else {
+    $adapters | ForEach-Object {
+        Result ("Interface " + $_.Name) $true ($_.LinkSpeed + ", MAC " + $_.MacAddress)
+    }
+}
+$cfg = Get-NetIPConfiguration | Where-Object { $_.NetAdapter.Status -eq "Up" }
+$cfg | Format-Table InterfaceAlias, IPv4Address, IPv4DefaultGateway -AutoSize | Out-String | Write-Output
+
+# ---------------------------------------------------------------------------
+Section "2. IP / ICMP REACHABILITY"
+# ---------------------------------------------------------------------------
+$gateway = ($cfg | Where-Object { $_.IPv4DefaultGateway } | Select-Object -First 1).IPv4DefaultGateway.NextHop
+if ($gateway) {
+    $r = Test-Connection $gateway -Count 2 -Quiet -ErrorAction SilentlyContinue
+    Result ("Ping gateway $gateway") $r
+}
+foreach ($ip in $TestIPs) {
+    $r = Test-Connection $ip -Count 2 -Quiet -ErrorAction SilentlyContinue
+    Result ("Ping $ip") $r
+}
+
+# ---------------------------------------------------------------------------
+Section "3. TCP/UDP SOCKET REACHABILITY"
+# ---------------------------------------------------------------------------
+foreach ($ip in $TestIPs) {
+    $tcp53 = Test-NetConnection $ip -Port 53 -InformationLevel Quiet -WarningAction SilentlyContinue
+    $tcp443 = Test-NetConnection $ip -Port 443 -InformationLevel Quiet -WarningAction SilentlyContinue
+    Result ("TCP/53 -> $ip") $tcp53
+    Result ("TCP/443 -> $ip") $tcp443
+}
+
+# Raw UDP/53 — bypasses DNS Client API, proves whether DNS protocol itself works.
+foreach ($ip in $TestIPs) {
+    try {
+        $u = New-Object System.Net.Sockets.UdpClient
+        $u.Client.ReceiveTimeout = ($Timeout * 1000)
+        $u.Client.SendTimeout = ($Timeout * 1000)
+        $u.Connect($ip, 53)
+        # Minimal DNS query for google.com A record
+        $q = [byte[]](0x12,0x34,0x01,0x00,0x00,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x06,0x67,0x6f,0x6f,0x67,0x6c,0x65,0x03,0x63,0x6f,0x6d,0x00,0x00,0x01,0x00,0x01)
+        [void]$u.Send($q, $q.Length)
+        $ep = New-Object System.Net.IPEndPoint([System.Net.IPAddress]::Any, 0)
+        $resp = $u.Receive([ref]$ep)
+        Result ("Raw UDP/53 -> $ip") $true ($resp.Length.ToString() + " bytes")
+        $u.Close()
+    } catch {
+        Result ("Raw UDP/53 -> $ip") $false $_.Exception.Message
+    }
+}
+
+# ---------------------------------------------------------------------------
+Section "4. DNS INFRASTRUCTURE (bypass tools)"
+# ---------------------------------------------------------------------------
+foreach ($srv in @("default") + $TestIPs) {
+    $cmd = if ($srv -eq "default") { "nslookup $TestHost" } else { "nslookup $TestHost $srv" }
+    $out = Invoke-Expression $cmd 2>&1 | Out-String
+    $resolved = $out -match "Addresses?:\s+(\d+\.\d+\.\d+\.\d+|[0-9a-f:]+)"
+    Result ("nslookup via $srv") $resolved
+    if (!$resolved) { Write-Output "  --- output ---"; $out | Select-String -Pattern "." | Select-Object -First 6 | ForEach-Object { Write-Output ("  " + $_) } }
+}
+
+# ---------------------------------------------------------------------------
+Section "5. WINDOWS DNS CLIENT API (the hook layer)"
+# ---------------------------------------------------------------------------
+try {
+    $r = Resolve-DnsName $TestHost -Type A -QuickTimeout -ErrorAction Stop
+    $ips = ($r | Where-Object { $_.Type -eq "A" } | Select-Object -ExpandProperty IPAddress) -join ", "
+    Result "Resolve-DnsName (system API)" $true $ips
+} catch {
+    Result "Resolve-DnsName (system API)" $false $_.Exception.Message
+}
+
+# If layer 4 passed but layer 5 failed, dump the NRPT — that's the prime suspect.
+$nrptRules = Get-DnsClientNrptRule -ErrorAction SilentlyContinue
+$catchAll = $nrptRules | Where-Object { $_.Namespace -eq "." }
+if ($catchAll) {
+    Write-Output "  !! Catch-all NRPT rule(s) detected (likely culprit):"
+    $catchAll | Format-Table Name, NameServers, Comment -AutoSize | Out-String | Write-Output
+}
+
+# DNS Client service status
+$dnsClient = Get-Service Dnscache -ErrorAction SilentlyContinue
+Result "DNS Client (Dnscache) service running" ($dnsClient.Status -eq "Running")
+
+# Port 53 listeners on the box itself
+$listeners = Get-NetUDPEndpoint -LocalPort 53 -ErrorAction SilentlyContinue
+if ($listeners) {
+    Write-Output "  Port 53 listeners on localhost:"
+    $listeners | ForEach-Object {
+        $p = Get-Process -Id $_.OwningProcess -ErrorAction SilentlyContinue
+        $svc = Get-CimInstance Win32_Service -ErrorAction SilentlyContinue | Where-Object { $_.ProcessId -eq $_.OwningProcess } | Select-Object -First 1
+        Write-Output ("    " + $_.LocalAddress + ":53  PID=" + $_.OwningProcess + "  " + $p.ProcessName + $(if ($svc) { " (" + $svc.Name + ")" } else { "" }))
+    }
+}
+
+# ---------------------------------------------------------------------------
+Section "6. APPLICATION LAYER (real HTTP request)"
+# ---------------------------------------------------------------------------
+foreach ($url in @("https://www.google.com","https://github.com")) {
+    try {
+        $r = Invoke-WebRequest -Uri $url -TimeoutSec $Timeout -UseBasicParsing -ErrorAction Stop
+        Result ("GET $url") $true ("HTTP " + $r.StatusCode + ", " + $r.RawContentLength + " bytes")
+    } catch {
+        Result ("GET $url") $false $_.Exception.Message
+    }
+}
+
+# ---------------------------------------------------------------------------
+Section "7. KNOWN VPN / DNS CLIENT FOOTPRINT"
+# ---------------------------------------------------------------------------
+# AV products (drives "Encrypted DNS Detection" type blocks)
+$av = Get-CimInstance -Namespace "root/SecurityCenter2" -ClassName AntiVirusProduct -ErrorAction SilentlyContinue
+if ($av) { $av | Select-Object displayName, productState | Format-Table -AutoSize | Out-String | Write-Output }
+
+# WFP-callout drivers (third-party kernel hooks on network stack)
+$wfp = Get-CimInstance Win32_SystemDriver | Where-Object {
+    $_.State -eq "Running" -and ($_.Name -match "epfwwfp|wfpcap|netbtsmb|pctcore|symefa|mfewfpk|kvfwwfp|bdfwfpf|cbfsfilter")
+}
+if ($wfp) {
+    Write-Output "  Third-party WFP/network drivers active:"
+    $wfp | Format-Table Name, State, PathName -AutoSize | Out-String | Write-Output
+}
+
+# Known VPN clients (common NRPT rule creators)
+$vpnPaths = @(
+    "C:\Program Files\Proton\VPN",
+    "C:\Program Files\Mullvad VPN",
+    "C:\Program Files (x86)\OpenVPN",
+    "C:\Program Files\WireGuard",
+    "C:\Program Files (x86)\Cisco\Cisco AnyConnect Secure Mobility Client",
+    "C:\Program Files\NordVPN",
+    "C:\Program Files (x86)\NextDNS"
+)
+$found = $vpnPaths | Where-Object { Test-Path $_ }
+if ($found) {
+    Write-Output "  VPN / DNS clients installed:"
+    $found | ForEach-Object { Write-Output ("    " + $_) }
+}
+
+Write-Output ""
+Write-Output "=== SUMMARY ==="
+Write-Output ("  PASS: " + $script:PASS_COUNT + "    FAIL: " + $script:FAIL_COUNT)
+if ($script:FIRST_FAIL) {
+    Write-Output ("  First failure: " + $script:FIRST_FAIL)
+    $next = switch -Wildcard ($script:FIRST_FAIL) {
+        "*LINK LAYER*"    { "check Get-NetAdapter, Get-NetIPConfiguration, DHCP state" }
+        "*SOCKET*"        { "check Windows Firewall outbound rules; AV protocol filtering; consumer router DoH IP blocking" }
+        "*ICMP*"          { "check Get-NetRoute, ISP/upstream connectivity" }
+        "*DNS INFRASTRUCTURE*" { "check UDP/53 outbound, router DNS forwarder" }
+        "*DNS CLIENT API*" { "scripts\\windows\\nrpt-audit.ps1   # drill rung 5 (the hook layer)" }
+        "*RESOLVER PATH*"  { "scripts\\windows\\nrpt-audit.ps1   # drill rung 5 (the hook layer)" }
+        "*APPLICATION*"   { "check netsh winhttp show proxy, cert store, IPv6 preference" }
+        default { "re-run with -Verbose; check references/common-culprits.md" }
+    }
+    Write-Output ("  Next: " + $next)
+} else {
+    Write-Output "  No failures. If user still reports issues, see rung 7 footprint and time-based notes in references/diagnostic-ladder.md."
+}
+Write-Output ""
+Write-Output "=== END PROBE ==="

+ 176 - 0
skills/net-ops/tests/run.sh

@@ -0,0 +1,176 @@
+#!/usr/bin/env bash
+# net-ops :: tests/run.sh
+# Lightweight self-tests. Run from the repo root:
+#   bash skills/net-ops/tests/run.sh
+#
+# These verify structural and output invariants of the probe scripts WITHOUT
+# trying to simulate broken network state. They catch regressions in:
+#  - bash syntax / unbound vars / set -u trips
+#  - section labels and ordering
+#  - --redact actually masking private addrs / tailnet names
+#  - --json producing parseable NDJSON
+#  - summary block format
+#  - dispatcher routing to the right per-OS script
+
+set -u
+
+PASS=0
+FAIL=0
+FAILED_TESTS=()
+
+assert() {
+    local name="$1"; shift
+    if "$@"; then
+        PASS=$((PASS+1))
+        printf "  [PASS] %s\n" "$name"
+    else
+        FAIL=$((FAIL+1))
+        FAILED_TESTS+=("$name")
+        printf "  [FAIL] %s\n" "$name"
+    fi
+}
+
+contains() { local hay="$1" needle="$2"; [[ "$hay" == *"$needle"* ]]; }
+not_contains() { local hay="$1" needle="$2"; [[ "$hay" != *"$needle"* ]]; }
+
+# Locate skill root regardless of invocation dir
+here="$(cd "$(dirname "$0")" && pwd)"
+root="$(cd "$here/.." && pwd)"
+
+echo "=== net-ops self-tests ==="
+echo "Root: $root"
+
+# Determine the local OS probe for testing
+case "$(uname -s)" in
+    Darwin) probe="$root/scripts/macos/probe.sh"; audit="$root/scripts/macos/dns-audit.sh" ;;
+    Linux)  probe="$root/scripts/linux/probe.sh"; audit="$root/scripts/linux/dns-audit.sh" ;;
+    *) echo "Skipping: unsupported OS for local probe tests." ; exit 0 ;;
+esac
+
+# ---------------------------------------------------------------------------
+echo
+echo "--- Probe structural tests ---"
+# ---------------------------------------------------------------------------
+
+out=$(bash "$probe" 2>&1)
+
+assert "probe runs without bash error" \
+    not_contains "$out" "syntax error"
+assert "probe runs without unbound variable error" \
+    not_contains "$out" "unbound variable"
+assert "probe emits summary block" \
+    contains "$out" "=== SUMMARY ==="
+assert "probe emits PASS/FAIL counts" \
+    contains "$out" "PASS:"
+check_all_sections() {
+    local out="$1"
+    for s in "1. LINK LAYER" "2. IP / ICMP" "3. TCP/UDP SOCKET" "4. DNS INFRASTRUCTURE" "6. APPLICATION" "7. KNOWN VPN"; do
+        contains "$out" "=== $s" || return 1
+    done
+    # Section 5 has OS-specific naming; match on the common anchor.
+    contains "$out" "(the hook layer)" || return 1
+    return 0
+}
+assert "probe contains all 7 sections" check_all_sections "$out"
+
+# ---------------------------------------------------------------------------
+echo
+echo "--- --redact tests ---"
+# ---------------------------------------------------------------------------
+
+redacted=$(bash "$probe" --redact 2>&1)
+
+# Common private patterns that should NEVER appear in redacted output.
+# (We use specific octets that are unlikely to appear in unrelated contexts.)
+assert "--redact masks 192.168.x.x" \
+    bash -c '! grep -E "\b192\.168\.[0-9]+\.[0-9]+\b" <<< "$0" | grep -v "192.168.X.X" >/dev/null' "$redacted"
+assert "--redact masks .ts.net tailnet names" \
+    bash -c '! grep -E "\b[a-z0-9-]+\.ts\.net\b" <<< "$0" | grep -v "REDACTED.ts.net" >/dev/null' "$redacted"
+assert "--redact preserves 100.100.100.100 anchor" \
+    bash -c '[[ "$0" != *"100.X.X.X"* ]] || grep -q "100.100.100.100" <<< "$0"' "$redacted"
+assert "--redact preserves 1.1.1.1 public anchor" \
+    contains "$redacted" "1.1.1.1"
+
+# ---------------------------------------------------------------------------
+echo
+echo "--- --json tests ---"
+# ---------------------------------------------------------------------------
+
+json_out=$(bash "$probe" --json 2>&1)
+
+assert "--json emits at least one section record" \
+    contains "$json_out" '"type":"section"'
+assert "--json emits at least one check record" \
+    contains "$json_out" '"type":"check"'
+assert "--json emits a summary record" \
+    contains "$json_out" '"type":"summary"'
+assert "--json summary contains pass count" \
+    bash -c 'grep -q "\"type\":\"summary\".*\"pass\":[0-9]" <<< "$0"' "$json_out"
+
+# ---------------------------------------------------------------------------
+echo
+echo "--- Dispatcher test ---"
+# ---------------------------------------------------------------------------
+
+disp_out=$("$root/scripts/probe" 2>&1 | tail -5)
+assert "dispatcher routes to per-OS probe (summary present)" \
+    contains "$disp_out" "PASS:"
+
+# ---------------------------------------------------------------------------
+echo
+echo "--- dns-audit smoke test ---"
+# ---------------------------------------------------------------------------
+
+audit_out=$(bash "$audit" 2>&1)
+assert "dns-audit runs without error" \
+    not_contains "$audit_out" "syntax error"
+assert "dns-audit emits attribution hints section" \
+    contains "$audit_out" "ATTRIBUTION HINTS"
+
+# ---------------------------------------------------------------------------
+echo
+echo "--- Edge cases ---"
+# ---------------------------------------------------------------------------
+
+# --json should emit ONLY JSON (no chatter leaking through)
+json_pure=$(bash "$probe" --json 2>&1)
+non_json=$(echo "$json_pure" | grep -vc '^{')
+assert "--json produces pure NDJSON (no non-JSON chatter)" \
+    bash -c '[[ "$0" -eq 0 ]]' "$non_json"
+
+# --json + --redact: redacted private addrs AND only JSON
+combo=$(bash "$probe" --json --redact 2>&1)
+combo_non_json=$(echo "$combo" | grep -vc '^{')
+combo_leaks=$(echo "$combo" | grep -E "\b192\.168\.[0-9]+\.[0-9]+\b" | grep -v "192.168.X.X")
+assert "--json + --redact produces pure NDJSON" \
+    bash -c '[[ "$0" -eq 0 ]]' "$combo_non_json"
+assert "--json + --redact has no private-IP leaks" \
+    bash -c '[[ -z "$0" ]]' "$combo_leaks"
+
+# Unknown flag should not crash
+assert "unknown --frobnicate flag does not crash" \
+    bash -c 'bash "$0" --frobnicate 2>&1 | grep -q "PASS\\|FAIL"' "$probe"
+
+# Help flag prints usage and exits cleanly
+help_out=$(bash "$probe" --help 2>&1)
+assert "--help mentions --redact" \
+    contains "$help_out" "--redact"
+assert "--help mentions --json" \
+    contains "$help_out" "--json"
+assert "--help mentions --quick" \
+    contains "$help_out" "--quick"
+
+# Dispatcher works from a different cwd
+disp_remote=$(cd /tmp && "$root/scripts/probe" 2>&1 | tail -5)
+assert "dispatcher works from /tmp (cwd-independent)" \
+    contains "$disp_remote" "PASS:"
+
+# ---------------------------------------------------------------------------
+echo
+echo "=== TOTAL: $PASS pass, $FAIL fail ==="
+if [[ "$FAIL" -gt 0 ]]; then
+    echo "Failed tests:"
+    for t in "${FAILED_TESTS[@]}"; do echo "  - $t"; done
+    exit 1
+fi
+exit 0