- merge RESTORE_HANDBOOK.md into ops/restore-tests/README.md (single
operations doc; restore status lives only in RESTORE_MATRIX maturity
table)
- RESTORE_MATRIX.md: extract embedded runbook drafts (261 -> 141 lines);
unraid-flash and tailscale stubs become ops/restore-tests runbooks,
adguard/redis checklists superseded by validated scripts
- delete six historical pre-first-run *-plan.md files (runbook + script
are the source of truth since the validated first runs)
- SERVICES_RECOVERY: drop completed task table; DISASTER_RECOVERY:
point related docs and section 11 to MASTER_TODO/schedule
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Traefik-Restore am 2026-06-03 erfolgreich: dynamic/ (2 Files) +
letsencrypt/acme.json (426K) aus Borg, File-Provider-Boot, /ping 200.
Erster Versuch, kein shfs-Problem.
11 von 12 Restore-Tests sind jetzt gruen. Einzig Nextcloud bleibt
blockiert durch Unraids shfs-chmod-Inkompatibilitaet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mail-Archiver-Restore am 2026-06-03 erfolgreich: Data-Protection-Keys
aus Borg + 645M pg_restore + HTTP 200. Erster Versuch, kein shfs-Problem.
10 von 12 Restore-Tests sind jetzt gruen. Verbleibend: Nextcloud
(blockiert/shfs-chmod) und Traefik (komplex, niedrigere Prio).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mealie-Restore-Test am 2026-06-03 erfolgreich: Borg-Data + pg_restore
+ HTTP 200, 3 Rezepte im Test-DB-Check. Erster Versuch, kein
shfs-Problem (Mealie startet als root, kein chmod auf User Shares).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Komodo-Mongo-Daten-Restore am 2026-06-03 erfolgreich: mongorestore
von komodo-mongo.archive.gz in Wegwerf-Mongo, 86904 Dokumente
(inkl. 32 Stack-Definitionen). Damit ist die kanonische Quelle fuer
KOMODO_*-Stack-ENV-Werte im DR-Fall als wiederherstellbar belegt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Operational hardening across several services after live incident
analysis between 2026-05-18 and 2026-05-20:
- Gitea: disable public registration and OpenID signup/signin to
stop the external POST / 5xx bursts that triggered availability
alerts. New repo-wide policy requires every productive
Micha/homelab-infra Komodo stack to ship with an active
Gitea->Komodo webhook on the current stack ID (documented in
CLAUDE.md, AI_CONTEXT.md, WORKFLOW.md).
- posture-check: extract the Disk1 fstype check into its own
function so the documented Disk1 NTFS exception no longer raises
ntfy warnings, skip POSIX inode checks on NTFS, and dedup ntfy
alerts via a fingerprint state file with ALERT_REPEAT_SECONDS
(default 24h). Repeat-spam on the same cause now suppressed.
- docker-critical-events: parse the event JSON for container name,
action, exit code and signal; drop `die exit=0` events (clean
stops); ship a structured ntfy message instead of the raw event
line.
- Borg UI: mount /mnt/user/services into the backup container as
/local/services:ro and include homelab-infra, stacks and
posture-check in all-important-sources.txt. RESTORE_MATRIX and
DISASTER_RECOVERY updated accordingly.
- Unraid user scripts: document the new
homelab-operations-report-daily cron job and the SMTP password
file it expects on the host.
- MIGRATION_LOG: capture the four live events from this window -
Gitea 5xx burst + signup closure, Komodo webhook reconciliation,
posture-check host-version verification, Borg scope extension,
and Traefik 5xx alert detuning.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>