Load this when responding to "my drive is dying, what do I do RIGHT NOW", filesystem-level corruption, boot configuration damage, or system file integrity issues. These are the procedures that have to be right the first time — getting them wrong destroys data.
These never bend:
Image first, repair second. When a drive is failing, your priority is getting data OFF it before doing anything that writes TO it. Repair operations write to bad sectors; that finishes a marginal drive faster than any other action.
Never chkdsk /f a failing drive. The /f flag writes fixes back to disk. If the drive is throwing hardware errors, every write is potentially the one that kills it. Read-only chkdsk (chkdsk with no flags, or explicitly chkdsk /scan /forceofflinefix) is OK; anything that writes is not.
Never run format or convert on a drive you want data from. Obvious but it gets done in panic.
Don't trust SMART "Healthy" when event logs are screaming. Windows reports SMART status based on a small handful of attributes; meanwhile the System log can have thousands of Event 7 / 154 hardware errors. The events are the truth.
Don't pound on a failing drive with retries. Robocopy default is /R:1000000. Use /R:0. Every retry on a bad sector causes the drive's internal retry-and-relocate logic to run, which stresses both the failing sector and the spare-sector pool.
/R:0When the drive is still mostly readable and you can mount it:
robocopy "Y:\important-data" "Z:\rescue\important-data" `
/MIR /XJ /COPY:DAT /DCOPY:T `
/R:0 /W:0 `
/MT:8 `
/V /BYTES /NP `
/LOG:"$env:TEMP\clone.log" /TEE
Flag breakdown:
| Flag | Effect |
|---|---|
/MIR |
Mirror — recursive copy AND delete files at destination that don't exist at source. Use /E instead if destination has other content to preserve. |
/XJ |
Skip junction points. Prevents infinite recursion if a junction loops back. |
/COPY:DAT |
Copy Data, Attributes, Timestamps. Skip ACL/Owner (faster, usually unwanted on a recovery target anyway). |
/DCOPY:T |
Also copy directory timestamps. |
/R:0 /W:0 |
Zero retries. Critical — skip bad sectors fast instead of retrying. |
/MT:8 |
8 threads (default; explicit for clarity). |
/V |
Verbose log includes which files were skipped — needed for the failed-files list. |
/BYTES /NP |
Cleaner log output for parsing. |
/LOG:path /TEE |
Log to file + console. |
Robocopy exits with a bitmask (>=8 means errors). The skill's scripts/recover-clone.ps1 wraps this with proper exit-code translation and failed-files extraction.
When the drive has many bad sectors or the filesystem itself is unreliable:
ddrescue (GNU ddrescue) reads the raw block device, skips errors, comes back later to retry just the failed regions. Two-pass recovery with a map file makes it resumable across crashes/cable yanks.
Install via WSL:
wsl sudo apt install gddrescue
Or boot a live Linux USB.
First pass — read everything that's easy:
ddrescue -d -r0 /dev/sdX recovery.img mapfile
-d direct (skip OS buffering)-r0 zero retries on first passSecond pass — retry the failed regions, this time aggressively:
ddrescue -d -r3 -R /dev/sdX recovery.img mapfile
-r3 three retries on remaining bad blocks-R reverse direction (sometimes recovers what forward couldn't)Then mount the image and copy files out:
sudo losetup -P -f recovery.img # Linux
# (Windows: mount via tools like OSFMount; the image is the raw device)
When the drive has mechanical failure (clicking, not spinning, drive ID lost) — stop touching it. Every power cycle risks more damage. Professional cleanroom recovery (Ontrack, DriveSavers, local equivalents) costs $300-3000 AUD depending on damage, but is the only option for physical-fault drives.
| Situation | Command | What it does |
|---|---|---|
| Failing drive — DO NOT RUN | Don't chkdsk /f |
Writes to disk; can finish off marginal drive |
| Drive healthy, suspicious files | chkdsk D: |
Read-only check. Reports problems. No writes. Safe. |
| Drive healthy, repair-OK | chkdsk D: /f |
Fixes filesystem errors. Locks volume. |
| Drive healthy, also fix bad sectors | chkdsk D: /r |
Implies /f + scans every sector + recovers what it can from bad ones. Days for large drives. |
| Drive healthy, faster repair | chkdsk D: /spotfix |
Fixes targeted issues only. Doesn't need offline volume. |
| System drive, schedule for next boot | chkdsk C: /f /scan |
Can't lock C: live; schedules check at next boot. |
| Just scan, don't fix | chkdsk D: /scan |
Online scan, reports only. Won't fix. |
NTFS will throw Ntfs Event 55 ("A corruption was discovered in the file system structure on volume X") when it spots metadata issues mid-operation. If you see this:
chkdsk /scan (read-only) to assessIf chkdsk reports MFT or $LogFile damage, the drive is in a precarious state. Options:
chkdsk /f on the clonentfsfix from Linux (lighter touch than Windows chkdsk; doesn't try to recover bad sectors)TestDisk to reconstruct partition tables and PhotoRec to extract files by signatureSymptoms: blue screens during boot, services failing to start, Windows Update broken, winver crashes.
Run in this order:
# 1. System File Checker - replaces corrupt protected files from cache
sfc /scannow
# 2. If sfc reports unfixable corruption, repair its own source (component store)
DISM /Online /Cleanup-Image /CheckHealth # quick check
DISM /Online /Cleanup-Image /ScanHealth # deeper scan
DISM /Online /Cleanup-Image /RestoreHealth # actually repair (uses Windows Update)
# 3. Then re-run sfc
sfc /scannow
DISM /RestoreHealth downloads replacement files from Windows Update, so the machine needs internet and a working WU stack. If WU itself is broken, supply a known-good install.wim via /Source:WIM:D:\sources\install.wim:1.
Over years the WinSxS component store grows. Reset/cleanup:
DISM /Online /Cleanup-Image /StartComponentCleanup # standard cleanup
DISM /Online /Cleanup-Image /StartComponentCleanup /ResetBase # plus drops update rollback data (saves more but irreversible)
Boot to Windows Recovery Environment (Windows RE):
bootrec /fixmbr :: Repair MBR (legacy BIOS only)
bootrec /fixboot :: Write new boot sector to system partition
bootrec /scanos :: Scan for Windows installs
bootrec /rebuildbcd :: Rebuild BCD store from scratch
If /fixboot returns "Access denied" (common on UEFI):
:: Find the EFI partition and rebuild bootloader
diskpart
list volume
select volume <EFI partition number> :: Usually ~100 MB, FAT32
assign letter=Z
exit
bcdboot C:\Windows /s Z: /f UEFI :: Recreate UEFI boot files
Symptom: BSOD 0x7B (INACCESSIBLE_BOOT_DEVICE) after hardware change. The BCD references the system drive by device path; if SATA ports rearranged or you added an NVMe, the path may be stale.
bcdedit /enum :: Show current BCD entries
bcdedit /set {default} device boot :: Reset to logical "boot"
bcdedit /set {default} osdevice boot
If a failing drive hosts (part of) the pagefile, Windows will continue to read/write to it under memory pressure — accelerating drive failure and risking BSOD 0x50 PAGE_FAULT_IN_NONPAGED_AREA.
# Find current pagefile location(s)
Get-CimInstance Win32_PageFileSetting
# Remove pagefile from a specific drive (requires admin + reboot)
$pf = Get-CimInstance Win32_PageFileSetting | Where-Object { $_.Name -like 'Y:*' }
$pf | Remove-CimInstance
# Or relocate: set on a healthy drive first, then remove from failing
$newPf = New-CimInstance -ClassName Win32_PageFileSetting -Property @{
Name = 'C:\pagefile.sys'
InitialSize = 0 # 0 = system managed
MaximumSize = 0
}
Changes apply at next reboot. If the failing drive can't be cleanly removed (it's needed at boot for some reason), at minimum reduce its pagefile to 16 MB minimum, 16 MB maximum to limit damage.
For a complete kernel memory dump on Win11, pagefile on the system drive must be ≥ RAM size (or DedicatedDumpFile configured). For minidumps, ≥256 MB is enough. System-managed sizing handles this automatically.
When you've decided to take a drive offline (failing, replacing, decommissioning), there's a hierarchy from least to most invasive:
# Take drive offline — Windows won't try to use it until next reboot or manual online
diskpart
DISKPART> select disk N
DISKPART> offline disk
DISKPART> exit
Useful when:
online disk brings it backReboot, enter BIOS, find storage configuration, disable the specific SATA port or NVMe slot. Use when:
offline doesn't help boot time)The complete solution. SATA: unplug data cable (power cable can stay). NVMe: unscrew the standoff and lift the drive out of the slot. Use when:
Don't trust format or even cipher /w: on a failing drive — bad sectors may retain readable data. For sensitive data on a drive being decommissioned:
cipher /w:Y:\ for a healthy SSD (forces wear-leveling to overwrite); for failing SSDs, physical destruction is the only reliable pathATA Secure Erase (hdparm --security-erase from Linux, or vendor tools like Samsung Magician) works on healthy SSDs but may hang on failing drives.
When Windows won't boot, work the layers:
| Symptom | Where it failed | First step |
|---|---|---|
| No POST, no fans, no LEDs | Power supply or motherboard | Check power, PSU |
| POST but no boot device found | Drive or BIOS settings | Check boot order; check drive is detected in BIOS |
| "Inaccessible boot device" (Win logo then crash) | BCD or boot driver | Boot to RE → bootrec /scanos then /rebuildbcd |
| Spinning dots forever | Driver hang or filesystem | Boot to RE → Startup Repair, then chkdsk /scan |
| Login screen reached but crash | User-mode driver/service | Safe Mode → identify recently changed driver |
| Login OK but desktop missing | Shell / profile issue | Safe Mode → check userinit.exe registration |
msconfig → Boot tab → Safe boot (revert after diagnosing!)Once in Safe Mode, common moves:
scripts/safe-disable-startup.ps1 works in Safe Mode too)sfc /scannow and DISM /Online /Cleanup-Image /RestoreHealthTroubleshoot → Advanced Options → System Restore
Picks a restore point and rolls back system files + registry + drivers (NOT personal data). Effective against recent driver/update issues. Useless if no restore points exist (Win10/11 sometimes turn off System Protection by default).
Time to call professional data recovery:
Cost ranges $300 (logical recovery — bad sectors but PCB intact) to $3000+ (head transplant, platter swap). Always get a quote before committing — quoted no-recovery-no-fee outfits exist.