homelab-infra

Author	SHA1	Message	Date
Micha	d1f9491b24	feat(restore): shared postgresql 18 cluster restore drill Kompletter Restore-Drill fuer den Shared-PostgreSQL-18-Cluster: globals (Rollen) + 5 per-DB Custom-Format-Dumps (paperless, mailarchiver, authelia, nextcloud, mealie). Bekannter mailarchiver-Bootstrap-Rollenkonflikt wird toleriert. Authelia/Nextcloud/Mealie-Dumps als optional markiert. Tabellen-Count pro DB als fachlicher Sanity-Check. Machbarkeit vorab verifiziert: alle Dumps auf Host vorhanden, pg_restore im postgres:18.4-Image verfuegbar, Postgres auf shfs bewiesen durch bestehende Tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 13:02:16 +02:00
Micha	14de2f4801	docs(restore): komodo mongo restore successful, update matrix and backlog Komodo-Mongo-Daten-Restore am 2026-06-03 erfolgreich: mongorestore von komodo-mongo.archive.gz in Wegwerf-Mongo, 86904 Dokumente (inkl. 32 Stack-Definitionen). Damit ist die kanonische Quelle fuer KOMODO_*-Stack-ENV-Werte im DR-Fall als wiederherstellbar belegt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 11:25:32 +02:00
Micha	90d1595285	fix(restore): komodo mongo restore own compose to avoid container name collision Zweiter Lauf scheiterte mit Auth-Failure weil der Container-Name restoretest-komodo-mongo mit dem alten Bootstrap-Test kollidierte (stale Datadir auf shfs mit anderen Credentials). Fix: eigenes Compose mit eigenem Container-Namen (restoretest-komodo-mongorestore) und eigenem Project-Name, damit keine Namenskollision mit dem bestehenden Bootstrap-Test entsteht. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 11:23:17 +02:00
Micha	c1985e177b	fix(restore): komodo mongorestore --noIndexRestore for auth compat Erstlauf 2026-06-03: 86904 Dokumente (inkl. 32 Stack-Dokumente) erfolgreich restored, aber Exit 1 weil der Index-Rebuild mit "Command createIndexes requires authentication" scheitert (Test-User hat keine dbAdmin-Rolle). Fix: --noIndexRestore. Fuer den Smoke-Zweck (Stack-Definitionen lesbar, KOMODO_*-ENV-Werte rekonstruierbar) reicht das. Indexe werden bei einem echten Komodo-Restart ohnehin neu aufgebaut. Nebenbefund: produktive Mongo ist 8.0.23, Test-Compose pinnt 7.0.32. Cross-Version-Warning ist fuer den Lesetest harmlos, aber der Bootstrap-Compose-Pin sollte separat auf 8.0 nachgezogen werden. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 11:20:53 +02:00
Micha	a244f2d677	feat(restore): komodo mongo data restore test Neuer Test: mongorestore von komodo-mongo.archive.gz in eine frische Wegwerf-Mongo. Beweist, dass die Stack-Definitionen und damit die KOMODO_*-Stack-ENV-Werte aus dem Dump rekonstruiert werden koennen (kanonische Quelle laut docs/DISASTER_RECOVERY.md 6.2.1). Machbarkeit vorab verifiziert: Dump 6.0M auf Host vorhanden, mongorestore im mongo:7.0.32-Image verfuegbar, shfs-Write funktioniert. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 11:18:39 +02:00
Micha	ef032f2dde	docs(restore): document nextcloud shfs-chmod blocker Nextcloud-Restore-Test Erstlauf 2026-06-03 nach 5 Iterationen als strukturell blockiert durch Unraid shfs/FUSE eingestuft. Ursache: Nextcloud 33 fuehrt zur Laufzeit chmod() auf Dateien unter /var/www/html aus (OC_Util.php#486). Auf Unraids FUSE/shfs User Shares ist chmod nicht moeglich - weder vom Host (chown ignoriert) noch aus dem Container (Operation not permitted), auch nicht ohne no-new-privileges. In Produktion funktioniert Nextcloud, weil die Daten dort auf einem Cache-Drive (XFS/BTRFS direkt) statt ueber shfs liegen. Scaffold (Skript + Compose) bleibt im Repo als Ausgangspunkt fuer die Loesung. Drei Optionen dokumentiert: a) Restore-Lab auf Cache-Drive b) Docker-Volumes statt Bind-Mounts c) tmpfs + rsync Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 11:14:39 +02:00
Micha	6fec64d0a1	fix(restore): nextcloud dump from host path instead of borg extract Erstlauf 2026-06-03: borg_extract fuer den Nextcloud-Dump scheiterte still (Pfad local/borg-dumps/latest/nextcloud.dump existiert im Archiv moeglicherweise unter einem anderen Prefix). Der Dump liegt taeglich frisch auf dem Host unter /mnt/user/backups/borg/dumps/latest/ und wird von dort in Borg gesichert - der Smoke-Wert ist identisch. HTML (App-Code + config) kommt weiterhin aus dem Borg-Archiv. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 11:03:45 +02:00
Micha	5d1ae68705	fix(restore): nextcloud permissions on unraid shfs (no-new-privileges removal) Zweiter Erstlauf 2026-06-03 scheiterte weiterhin mit 503, obwohl config.php korrekt gepatcht war. Ursache: Unraid's FUSE/shfs-Dateisystem auf User-Shares ignoriert chown -R 33:33 still — Dateien bleiben bei sshd:sshd. Der Nextcloud-Entrypoint versucht intern chmod/chown auf /var/www/html und /var/www/html/data, was mit no-new-privileges:true blockiert wird. Fix: - no-new-privileges vom restoretest-nextcloud Container entfernt, damit der Entrypoint Rechte im Container selbst setzen kann (Test-Postgres und Test-Redis behalten no-new-privileges) - Host-seitiger chown durch chmod a+rwX ersetzt (funktioniert auf shfs) - Vertretbar im isolierten Smoke-Kontext (127.0.0.1, Wegwerf-Daten, kein Traefik) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 10:55:56 +02:00
Micha	2913e1005f	fix(restore): nextcloud chown 33:33 for www-data after borg extract Erstlauf 2026-06-03 scheiterte mit dauerhaft 503. config.php-Patching (Redis-Host + trusted_domains) war korrekt, aber Nextcloud konnte die restaurierten Dateien nicht lesen/schreiben: "chmod(): Operation not permitted at OC_Util.php#486". Ursache: Borg-Extract ueber den borg-ui Container legt Dateien mit dem borg-ui-User (sshd o.ae.) an. Nextcloud im Container laeuft als www-data (UID 33). Mit no-new-privileges:true scheitert jeder chmod/ chown-Versuch im Container. Fix: chown -R 33:33 auf html/ und data/ nach dem Extract, bevor der Nextcloud-Container startet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 10:44:12 +02:00
Micha	6f0e6f0d5a	fix(restore): nextcloud config.php patching for redis host and trusted_domains Erstlauf 2026-06-03 scheiterte mit 503: Redis-Host war noch auf dem produktiven 'nextcloud-redis' statt 'restoretest-nextcloud-redis', und trusted_domains enthielt kein 127.0.0.1 (Nextcloud blockt mit "Access through untrusted domain"). Ursache: das sed-Pattern fuer Redis versuchte den ganzen Array-Block einzeilig zu ersetzen, traf aber das PHP-Mehrzeilenformat nicht. Und das trusted_domains-sed fand das Schliessmuster nicht zuverlaessig. Fix: - Redis-Host separat per sed patchen (nur den 'host'-Wert im Block) - trusted_domains per PHP-CLI rewrite (robuster als sed auf PHP-Arrays) - Fallback auf sed fuer Hosts ohne php Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 10:34:30 +02:00
Micha	f473fbaa8b	feat(restore): nextcloud restore smoke test scaffold Nextcloud-Restore-Test nach dem Muster der anderen Restore-Smokes: - Borg-Extract von html (App-Code + config.php) und nextcloud.dump - pg_restore in isoliertes Test-Postgres (mit Retry-Schleife) - config.php wird im Restore-Lab auf Test-DB-Credentials gepatcht (produktive Secrets werden nicht gemountet) - Nextcloud startet gegen restaurierte Daten + Test-Redis - Smoke prueft HTTP /status.php und occ status (maintenance mode) - Produktive Nutzdaten unter /mnt/user/documents/nextcloud-data werden bewusst NICHT gemountet (zu gross fuer regelmaessigen Smoke) Erster Lauf steht aus und braucht Operator-Freigabe auf dem Host. Dispatcher und ntfy-Wrapper um Nextcloud erweitert. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 10:05:10 +02:00
Micha	c922d1f241	docs(restore): finalize audit - handbook update, reifegrad matrix, backlog Schliesst das Restore-Skills-Audit 2026-06-02/03 ab: - RESTORE_HANDBOOK.md auf Stand 2026-06-03: alle 6 verifizierten Tests (Vaultwarden, Gitea, Paperless, Immich, Authelia, Komodo-Bootstrap) dokumentiert, Frequenz-Tabelle aktualisiert, Betriebsmodus auf V1+ (mit ntfy), Schnellstart um Immich/Authelia/Komodo ergaenzt, Report-Aufbewahrungsregel dokumentiert, Ausbaustufen priorisiert. - RESTORE_MATRIX.md: neue Sektion "Restore-Test-Reifegrad" mit Uebersichtstabelle (pro Dienst: Tier, letzter Test, Typ, naechster Lauf) und priorisierter Kandidatenliste fuer fehlende Tests. - Gitea-Restore: SSH-Check im Report korrekt als "TCP connect only" benannt statt "SSH port open" (war Audit-Finding M3). - AUDIT_2026-05-25_TODO.md: Restore-Audit-Backlog ergaenzt mit den verbleibenden 8 offenen Punkten (Nextcloud, Shared PG18, Komodo-Mongo, Mailarchiver, Mealie, Traefik, Negativ-Test, E2E-DR-Drill). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 09:31:19 +02:00
Micha	ba3ef8fcfc	docs(restore): mark authelia smoke successful and schedule	2026-06-03 08:55:04 +02:00
Micha	52fc007123	fix(restore): authelia smoke without dump-restore, drop bogus env, disable ntp Erstlauf 2026-06-03 hat einen by-design-Konflikt offengelegt: pg_restore des produktiven postgresql17-authelia.dump in eine Test-Instanz mit Wegwerf AUTHELIA_STORAGE_ENCRYPTION_KEY scheitert im Authelia-Startup-Check mit "the configured encryption key does not appear to be valid for this database". Productive Storage-Werte werden mit dem produktiven Key verschluesselt; ein Wegwerf-Key kann sie nicht entschluesseln. Smoke ist deshalb explizit auf Config-Restore + Boot reduziert, nicht Daten-Decrypt. Zwei Nebenbefunde aus demselben Lauf: - AUTHELIA__SERVER__ADDRESS (Doppel-Underscore) wurde von Authelia 4.39 abgelehnt ("configuration environment variable not expected"). ENV entfernt; server.address kommt eh aus der generierten configuration.yml. - ntp-Startup-Check schlug fehl ("Could not determine the clock offset ... lookup time.cloudflare.com on 127.0.0.1:53: server misbehaving"), weil das isolierte Test-Compose-Netz keinen DNS-Resolver fuer NTP hat. Neuer Test-Config-Block setzt ntp.disable_startup_check: true. Doku nachgezogen (Plan + Runbook): Encryption-Key-Konflikt ist explizit als "nicht Teil dieses Smokes" dokumentiert; Fehler-Matrix hat Eintraege fuer Doppel-Underscore-ENV und NTP-Lookup. Frische des produktiven authelia-Dumps wird unveraendert ueber check-restore-freshness.sh ueberwacht; Daten-Decrypt-Drill ist eine eigene DR-Aufgabe mit kontrollierter Schluessel-Verwendung. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 08:27:40 +02:00
Micha	8d71dfb9ad	fix(restore): authelia smoke default_policy two_factor (rules-less) Authelia 4.39 verlangt: ohne access_control.rules muss default_policy 'two_factor' oder 'one_factor' sein. 'bypass' war nur historisch zulaessig, mit 4.39 schlaegt config validate fehl mit "'default_policy' option 'bypass' is invalid: when no rules are specified it must be 'two_factor' or 'one_factor'". /api/health ist public und laeuft nicht durch access_control - die Smoke-Semantik bleibt unveraendert. Beobachtet im Erstlauf 2026-06-03 nach Refactor auf Minimal-Testkonfig (Commits 541c7be..440000c). Mit diesem Fix sollte 'authelia config validate' durchlaufen; HTTP /api/health-Smoke ist der Folgeschritt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 08:09:35 +02:00
Micha	440000c085	fix(restore): generate minimal authelia smoke config	2026-06-03 08:04:59 +02:00
Micha	cacf77bfb0	fix(restore): avoid authelia smtp env in smoke test	2026-06-03 08:01:10 +02:00
Micha	cd4dd178ed	fix(restore): isolate authelia runtime config mount	2026-06-03 07:57:57 +02:00
Micha	541c7be853	fix(restore): generate sanitized authelia test config	2026-06-03 07:43:57 +02:00
Micha	b1ae9f3c26	fix(restore): harden restore checks and add authelia smoke scaffold	2026-06-03 07:39:05 +02:00
Micha	4e34582008	Trim documentation to active runbooks	2026-05-31 23:26:12 +02:00
Micha	1d98945a67	fix: make restore test scripts executable	2026-05-31 21:44:59 +02:00
Micha	268df30a13	chore: finish postgres redis stateful migrations	2026-05-31 20:32:25 +02:00
Micha	67ec40b762	Docs sweep: reflect Komodo bootstrap first run + clean stale "still open" notes Six files had outdated status notes that the F-09 first run on 2026-05-30 made wrong: - ops/restore-tests/komodo-bootstrap-runbook.md: "Erster echter Lauf steht noch aus" -> first run confirmed - ops/restore-tests/komodo-bootstrap-plan.md: "Noch offen vor dem ersten echten Lauf" section -> "Bestaetigte Laeufe" table with the --what-if and --keep-data runs - ops/restore-tests/immich-runbook.md: status note still said "Erster echter Lauf steht noch aus" although the Immich first run was 2026-05-27; correcting in the same sweep - docs/AUDIT_2026-05-25_TODO.md: Sprint 2 entry on Komodo bootstrap path no longer carries the "Trockenlauf-Skript bleibt als offene Folgeaufgabe" tail - docs/SERVICES_RECOVERY.md: replaced the "Trockenlauf-Idee (Doku-only, nicht ausgefuehrt)" section with the confirmed repo-script flow and marked the two "Naechste Aufgaben" rows about the dry-run as done - docs/RESTORE_DRILL_ROUTINE.md: Q2 2026 DR-Sanity-Check entry now splits Komodo-Bootstrap-Pfad (done) from the two still-open items (Gitea bundles, secrets inventory) No behavior change, only documentation consistency. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-30 11:18:37 +02:00
Micha	e4b0db2af6	Add Komodo bootstrap dry-run scaffold (F-09 rest) Mirror of the Immich restore-test pattern for the Komodo bootstrap anchor. Brings up a throwaway komodo-mongo + komodo-core + komodo-periphery under project restoretest-komodo, isolated from production: - same image digests as production (mongo:7.0.32, komodo-core:2, komodo-periphery:2) to prove compose-level bootstrap compatibility - restore-lab paths under /mnt/user/backups/restore-lab/komodo - 127.0.0.1:19120 only, no LAN bind, no Traefik, no Authelia - test periphery runs WITHOUT docker.sock mount and WITHOUT /mnt/user/services mount; cannot manage productive containers - KOMODO_* secrets are throwaway placeholders hardcoded in the test compose; productive secrets never enter this path Smoke test: compose config valid, mongo healthy, mongo auth-ping with test creds, komodo-core HTTP 200/302/303/401, periphery container running. Report under restore-reports/komodo-bootstrap-*. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 15:25:41 +02:00
Micha	c4fd4154db	Document quarterly restore drill routine New docs/RESTORE_DRILL_ROUTINE.md introduces a three-stage model: weekly freshness check, monthly/bimonthly mini-restores, quarterly DR sanity check. Tracks confirmed mini-restores (Vaultwarden, Gitea, Paperless 2026-05-07; Immich 2026-05-27) and rotates services by quarter Q1-Q4. Includes ten-point DR sanity check and abort rules that point at the drift runbook. No host schedule is created; the existing ops/restore-tests/schedule.md now references this routine as the source for quarterly assignment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 20:15:43 +02:00
Micha	52414c47be	Record Immich restore test success	2026-05-27 18:38:14 +02:00
Micha	a8c440d4da	Read Immich v2 restore counts	2026-05-27 18:33:29 +02:00
Micha	12cf8fb728	Prepare Immich restore upload markers	2026-05-27 18:29:53 +02:00
Micha	5b0782a8fa	Harden Immich restore smoke checks	2026-05-27 18:25:30 +02:00
Micha	a805f03481	Retry Immich restore during Postgres startup	2026-05-27 18:18:55 +02:00
Micha	4feecf4a8e	Make Immich restore database creation idempotent	2026-05-27 18:16:25 +02:00
Micha	2e84700326	Make Immich restore test create database	2026-05-27 18:14:40 +02:00
Micha	8a19c45485	Use Borg known_hosts in restore tests	2026-05-27 18:12:48 +02:00
Micha	c5d231a0db	Prepare Immich restore smoke test	2026-05-26 21:33:01 +02:00
Micha	d50b11784d	Add Unraid flash config to Borg preflight	2026-05-25 19:36:16 +02:00
Micha	b6bbca43ad	Replace Uptime Kuma with monitoring checks	2026-05-25 16:37:46 +02:00
Micha	29eaf8001f	Normalize ntfy alert routing	2026-05-17 14:57:45 +02:00
Micha	6ca829ec45	Document Unraid automation schedules	2026-05-16 20:11:19 +02:00
Micha	0adddb6533	Add Unraid automation script templates	2026-05-16 14:34:35 +02:00
Micha	5ada1ad153	Treat Filebrowser state as file-backed dump	2026-05-16 13:16:01 +02:00
Micha	878ad2d5f1	Harden backup and posture checks	2026-05-16 13:04:22 +02:00
Micha	d7e1eb33ba	Improve restore job ntfy timeout and output	2026-05-07 11:34:50 +02:00
Micha	008ab9bc4a	Add ntfy wrapper for restore jobs	2026-05-07 11:26:15 +02:00
Micha	7ff7284f6b	Add host-ready restore automation scripts	2026-05-07 11:20:03 +02:00
Micha	d20b687211	Add restore handbook and Unraid job guide	2026-05-07 11:11:36 +02:00
Micha	16416d964f	Add restore test automation scaffolding	2026-05-07 11:07:46 +02:00
Micha	2cc39c73f6	Add validated Paperless restore test pattern	2026-05-07 11:01:27 +02:00
Micha	d351b1cac8	Add validated Gitea restore test pattern	2026-05-07 10:00:58 +02:00
Micha	df4d335907	Document validated Vaultwarden restore pattern	2026-05-07 09:39:29 +02:00

1 2

51 Commits