fix(restore): harden restore checks and add authelia smoke scaffold
This commit is contained in:
@@ -382,7 +382,7 @@ Vor dem Start muessen vorhanden sein:
|
||||
- `/mnt/user/appdata/secrets/authelia_smtp_password.txt`
|
||||
- SMTP-Zugang fuer `michideheld@gmx.de`
|
||||
|
||||
Beim Smoke-Test muss `authelia validate-config` erfolgreich sein; der SMTP-Startup-Check darf den Start nicht blockieren.
|
||||
Beim Smoke-Test muss `authelia config validate` erfolgreich sein; der SMTP-Startup-Check darf den Start nicht blockieren.
|
||||
|
||||
### `nextcloud`
|
||||
|
||||
|
||||
@@ -126,7 +126,7 @@ Die Vorlagen stehen in:
|
||||
Host-Repo-Pfad:
|
||||
|
||||
```text
|
||||
/mnt/user/services/homelab
|
||||
/mnt/user/services/homelab-infra
|
||||
```
|
||||
|
||||
V1-Jobs:
|
||||
@@ -169,31 +169,31 @@ Nur `Container laeuft` reicht nicht.
|
||||
Auf dem Unraid-Host:
|
||||
|
||||
```bash
|
||||
bash /mnt/user/services/homelab/ops/restore-tests/run-restore-checks.sh freshness
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-checks.sh freshness
|
||||
```
|
||||
|
||||
### Vaultwarden Restore-Check
|
||||
|
||||
```bash
|
||||
bash /mnt/user/services/homelab/ops/restore-tests/run-restore-checks.sh vaultwarden
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-checks.sh vaultwarden
|
||||
```
|
||||
|
||||
### Gitea Restore-Check
|
||||
|
||||
```bash
|
||||
bash /mnt/user/services/homelab/ops/restore-tests/run-restore-checks.sh gitea
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-checks.sh gitea
|
||||
```
|
||||
|
||||
### Paperless Restore-Check
|
||||
|
||||
```bash
|
||||
bash /mnt/user/services/homelab/ops/restore-tests/run-restore-checks.sh paperless
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-checks.sh paperless
|
||||
```
|
||||
|
||||
### Optional mit `ntfy`
|
||||
|
||||
```bash
|
||||
bash /mnt/user/services/homelab/ops/restore-tests/run-restore-job-with-ntfy.sh freshness homelab-info
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-job-with-ntfy.sh freshness homelab-info
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
@@ -32,11 +32,11 @@ Sie ist die fachliche Ergaenzung zu `docs/DISASTER_RECOVERY.md`.
|
||||
| Tailscale | Share / Borg | `/mnt/user/appdata/tailscale` | keine | Tailscale-State im Pfad | Host-Netz | Tailscale verbunden |
|
||||
| PostgreSQL 18 | Share + Dumps | `/mnt/user/appdata/postgresql18` (archivierter Rollback-Altstand: `/mnt/user/appdata/_archive/pg18-immich-rollback-volumes-20260602/postgresql17`) | `postgresql17-globals.sql`, `postgresql17-mailarchiver.dump`, `postgresql17-paperless.dump`, optional `postgresql17-authelia.dump` | `postgres_password.txt`, App-Rollen-Passwoerter aus den jeweiligen Stack-ENV/Secret-Dateien | `backend_net` | DB startet, Ziel-Datenbanken vorhanden; `SHOW data_checksums` ist `on` |
|
||||
| Redis 8 | Share / Host | `/mnt/user/appdata/redis`; Rollback-Backup unter `/mnt/user/backups/borg/dumps/latest/shared-redis-pre-redis8-<ts>` | RDB/AOF-Dateien im Datenpfad | `redis_password.txt` | `backend_net` | Redis startet, `redis_version` ist 8.x, Apps verbinden sich |
|
||||
| Authelia | Borg | `/mnt/user/appdata/authelia/config`, `/mnt/user/appdata/secrets/*authelia*` | Shared PostgreSQL 18, optional Dump `postgresql17-authelia.dump` | JWT/Session/Storage/Postgres-/SMTP-Secret-Dateien | PostgreSQL 18, Traefik, GMX SMTP | Login-Seite und ForwardAuth funktionieren; SMTP-Notifier startet; aktive Sessions werden nach Restart neu aufgebaut |
|
||||
| Authelia | Borg | `/mnt/user/appdata/authelia/config`, `/mnt/user/appdata/secrets/*authelia*` | Shared PostgreSQL 18, optional Dump `postgresql17-authelia.dump` | JWT/Session/Storage/Postgres-/SMTP-Secret-Dateien | PostgreSQL 18, Traefik, GMX SMTP | Login-Seite und ForwardAuth funktionieren; SMTP-Notifier startet; aktive Sessions werden nach Restart neu aufgebaut; Restore-Test-Scaffold am 2026-06-02 abgelegt (`ops/restore-tests/authelia-*`), erster Lauf steht aus |
|
||||
| Gitea | GitHub-Mirror + Gitea-Bundles fuer Repo-Bootstrap, Borg + Dump fuer Gitea-Appstate | `/mnt/user/services/gitea/data`, `/mnt/user/backups/git-bundles/gitea` | `gitea.sqlite.dump`, Bundle-Report `latest-report.md` | `borg_repo_passphrase.txt` fuer Restore-Tests; GitHub-Push-Mirror-PAT liegt nur in Gitea-Mirror-Settings | Traefik | Web-UI erreichbar, Repo sichtbar, SSH-Port reagiert; Bundle laesst sich klonen und `git fsck` ist sauber; GitHub-Push-Mirror synchronisiert ohne `last_error`; Mini-Restore nach `/mnt/user/backups/restore-lab/gitea` am 2026-05-07 erfolgreich validiert |
|
||||
| Komodo | Borg / Share | `/mnt/user/appdata/komodo/core`, `/mnt/user/appdata/komodo/periphery`, `/mnt/user/services/stacks` | `komodo-mongo.archive.gz` falls verifiziert | `komodo_mongo_password.txt`, `KOMODO_*` Stack ENV | Traefik, Mongo, Gitea | UI erreichbar, Periphery verbunden |
|
||||
| GitOps Host Automation | Borg / Git | `/mnt/user/services/homelab-infra`, `/mnt/user/services/posture-check` | keine eigene DB | keine | Gitea, Komodo, Unraid User Scripts | `posture-check` laeuft vom Host-Pfad und liefert `warning_count: 0` im bekannten Uebergangszustand |
|
||||
| Vaultwarden | Borg + Dump | `/mnt/user/appdata/vaultwarden` | `vaultwarden.sqlite.dump` | `vaultwarden_admin_token.txt`, `borg_repo_passphrase.txt` fuer Restore-Tests | Traefik | Login-Seite erreichbar, Tresor-Daten sichtbar; Mini-Restore nach `/mnt/user/backups/restore-lab/vaultwarden` am 2026-05-07 erfolgreich validiert |
|
||||
| Vaultwarden | Borg + Dump | `/mnt/user/appdata/vaultwarden` | `vaultwarden.sqlite.dump` | `vaultwarden_admin_token.txt` fuer Produktion; Restore-Test nutzt Wegwerf-Admin-Token und `borg_repo_passphrase.txt` | Traefik | Login-Seite erreichbar, Tresor-Daten sichtbar; Mini-Restore nach `/mnt/user/backups/restore-lab/vaultwarden` am 2026-05-07 erfolgreich validiert |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -20,6 +20,7 @@ Ziel:
|
||||
## Geplante Struktur
|
||||
|
||||
- `schedule.md`: Intervalle und Verantwortlichkeiten
|
||||
- `common.sh`: gemeinsame Helfer fuer Borg-Lookup, Borg-Extract und Compose-Cleanup; prueft vor Borg-Operationen auch `borg-ui:/data/borg.db` und `borg-ui:/local/secrets/borg_repo_passphrase.txt`
|
||||
- `vaultwarden-restore-test.ps1`: erster Mini-Restore-Ablauf
|
||||
- `vaultwarden-restore-test.sh`: hosttauglicher Vaultwarden-Restore-Job
|
||||
- `vaultwarden-plan.md`: konkreter Vaultwarden-Testplan
|
||||
@@ -37,6 +38,10 @@ Ziel:
|
||||
- `immich-plan.md`: konkreter Immich-Testplan
|
||||
- `immich-runbook.md`: Operator-Runbook fuer den ersten Immich-Lauf
|
||||
- `immich-compose.test.yml`: isolierte Testinstanz fuer Immich inkl. VectorChord/pgvector-Test-Postgres und Test-Redis
|
||||
- `authelia-restore-test.sh`: Authelia-Restore-Job (Scaffold; Erstlauf noch offen)
|
||||
- `authelia-compose.test.yml`: isolierte Testinstanz fuer Authelia inkl. Test-Postgres, Filesystem-Notifier (kein echter SMTP-Versand)
|
||||
- `authelia-plan.md`: konkreter Authelia-Testplan
|
||||
- `authelia-runbook.md`: Operator-Runbook fuer den ersten Authelia-Lauf
|
||||
- `check-restore-freshness.ps1`: woechentlicher Frische-Check fuer Dumps und Reports
|
||||
- `run-restore-checks.ps1`: einfacher Dispatcher fuer Restore-Jobs
|
||||
- `check-restore-freshness.sh`: hosttauglicher Frische-Check
|
||||
|
||||
@@ -0,0 +1,56 @@
|
||||
services:
|
||||
restoretest-authelia-postgres:
|
||||
# Gleiche Major-Version wie shared PostgreSQL 18 in Produktion.
|
||||
image: postgres:18.4@sha256:8ff36f3c66371cba71d20ceedccfc3de9669a68737607888c4ef0af93abe8e39
|
||||
container_name: restoretest-authelia-postgres
|
||||
restart: "no"
|
||||
environment:
|
||||
TZ: Europe/Berlin
|
||||
POSTGRES_USER: authelia
|
||||
POSTGRES_DB: authelia
|
||||
POSTGRES_PASSWORD: restoretest-authelia-db
|
||||
PGDATA: /var/lib/postgresql/18/docker
|
||||
volumes:
|
||||
- /mnt/user/backups/restore-lab/authelia/postgres:/var/lib/postgresql
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U authelia -d authelia"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 10
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
|
||||
restoretest-authelia:
|
||||
# Gleicher Image-Digest wie security/authelia/docker-compose.yml in Produktion.
|
||||
image: authelia/authelia:4.39.20@sha256:1b363e9279e742397966333f364e0876ae02bf5c876de73e83af6d48c57ff51b
|
||||
container_name: restoretest-authelia
|
||||
restart: "no"
|
||||
depends_on:
|
||||
restoretest-authelia-postgres:
|
||||
condition: service_healthy
|
||||
command:
|
||||
- authelia
|
||||
- --config=/config/configuration.yml
|
||||
- --config=/config/configuration.test-overlay.yml
|
||||
environment:
|
||||
TZ: Europe/Berlin
|
||||
# Wegwerf-Secrets nur fuer den isolierten Smoke. Niemals produktive
|
||||
# Authelia-Secrets in diesem Compose verwenden. Die produktiven
|
||||
# authelia_*_FILE-Mounts werden bewusst NICHT eingebunden.
|
||||
AUTHELIA_JWT_SECRET: restoretest-authelia-jwt-secret-placeholder-32bytes
|
||||
AUTHELIA_SESSION_SECRET: restoretest-authelia-session-secret-placeholder-32
|
||||
AUTHELIA_STORAGE_ENCRYPTION_KEY: restoretest-authelia-storage-enc-key-placeholder-32
|
||||
AUTHELIA_STORAGE_POSTGRES_PASSWORD: restoretest-authelia-db
|
||||
AUTHELIA_NOTIFIER_SMTP_PASSWORD: restoretest-authelia-smtp-placeholder
|
||||
# Die command:-Config laedt configuration.yml + Test-Overlay (zweite
|
||||
# Datei gewinnt bei Konflikt). Das Overlay zwingt storage/notifier/
|
||||
# session auf isolierte Test-Backends, damit kein produktives Postgres
|
||||
# erreicht und kein echter SMTP-Versand ausgeloest wird.
|
||||
AUTHELIA__SERVER__ADDRESS: tcp://0.0.0.0:9091
|
||||
volumes:
|
||||
- /mnt/user/backups/restore-lab/authelia/config:/config
|
||||
ports:
|
||||
# nur 127.0.0.1, keine Public-Route, keine Traefik-Labels
|
||||
- "127.0.0.1:19091:9091"
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
@@ -0,0 +1,94 @@
|
||||
# Authelia Restore Test Plan
|
||||
|
||||
## Ziel
|
||||
|
||||
Nachweisen, dass die Authelia-Konfiguration aus dem produktiven Borg-Archiv in einer isolierten Testumgebung wieder lauffaehig ist und der HTTP-Health-Endpunkt antwortet, ohne dass dabei produktive Secrets, produktives Postgres oder produktiver SMTP-Versand beruehrt werden.
|
||||
|
||||
Bewusst **nicht** Teil dieses Tests:
|
||||
|
||||
- Restore mit produktiven Authelia-Secrets. Der Test nutzt ausschliesslich Wegwerf-Werte fuer `AUTHELIA_JWT_SECRET`, `AUTHELIA_SESSION_SECRET`, `AUTHELIA_STORAGE_ENCRYPTION_KEY`, `AUTHELIA_STORAGE_POSTGRES_PASSWORD`, `AUTHELIA_NOTIFIER_SMTP_PASSWORD`. Eine Echtsession auf produktiven Daten waere fachlich nicht sinnvoll.
|
||||
- SMTP-Realanruf an GMX. Notifier wird im Test-Overlay auf Filesystem umgebogen.
|
||||
- Forward-Auth gegen Traefik. Test laeuft nur auf `127.0.0.1:19091`, keine Traefik-Route.
|
||||
- WebAuthn-/Duo-/OIDC-Identity-Provider-Endpunkte. Smoke prueft `/api/health`.
|
||||
|
||||
## Quelle
|
||||
|
||||
- Backup-Quelle: produktives Borg-Archiv (`hetzner_borg_appdata_critical`)
|
||||
- fachlich relevante Pfade im Archiv:
|
||||
- `local/appdata/authelia/config` (verpflichtend)
|
||||
- `local/borg-dumps/latest/postgresql17-authelia.dump` (optional, wenn vorhanden)
|
||||
- produktive Secrets unter `/mnt/user/appdata/secrets/authelia_*.txt` werden **nicht** gemountet
|
||||
|
||||
## Test-Ziel
|
||||
|
||||
- Restore-Lab: `/mnt/user/backups/restore-lab/authelia`
|
||||
- Testdatenpfade:
|
||||
- `/mnt/user/backups/restore-lab/authelia/config` (restaurierte configuration.yml + Test-Overlay)
|
||||
- `/mnt/user/backups/restore-lab/authelia/postgres` (Test-Postgres-Datadir)
|
||||
- `/mnt/user/backups/restore-lab/authelia/dumps/latest/postgresql17-authelia.dump` (falls extrahiert)
|
||||
- `/mnt/user/backups/restore-lab/authelia/config/notifier/notifications.txt` (Filesystem-Notifier-Ausgabe)
|
||||
- Testcontainer:
|
||||
- `restoretest-authelia` (Image-Pin wie Produktion)
|
||||
- `restoretest-authelia-postgres` (postgres:18.4, gleiche Major wie shared Postgres)
|
||||
- Testport: `127.0.0.1:19091:9091`
|
||||
- Report-Ziel: `/mnt/user/backups/restore-reports/authelia-YYYY-MM-DD.md`
|
||||
|
||||
## Schutzregeln
|
||||
|
||||
- produktive Pfade `/mnt/user/appdata/authelia/*` werden **nicht** beschrieben
|
||||
- produktive Secret-Dateien `/mnt/user/appdata/secrets/authelia_*.txt` werden **nicht** gemountet
|
||||
- produktive shared PostgreSQL 18 wird **nicht** angesprochen (Test-Overlay zwingt `storage` auf Test-Postgres)
|
||||
- echter SMTP-Versand wird **nicht** ausgeloest (Test-Overlay zwingt `notifier` auf Filesystem)
|
||||
- produktive Domain `auth.kaleschke.info` wird **nicht** uebernommen
|
||||
- Testcontainer publishen nur auf `127.0.0.1`, keine LAN-/Tailscale-Bindung
|
||||
- Borg-Passphrase wird aus `/mnt/user/appdata/secrets/borg_repo_passphrase.txt` gelesen und nirgendwo geloggt
|
||||
|
||||
## Geplanter Ablauf
|
||||
|
||||
1. Restore-Lab-Pfade leer anlegen
|
||||
2. `local/appdata/authelia/config` aus dem aktuellsten Borg-Archiv extrahieren
|
||||
3. optional `local/borg-dumps/latest/postgresql17-authelia.dump` extrahieren; wenn nicht im Archiv vorhanden, weiter ohne DB-Restore
|
||||
4. Test-Overlay-Datei `configuration.test-overlay.yml` neben die restaurierte `configuration.yml` schreiben (zwingt storage/notifier/session auf Test-Werte)
|
||||
5. Test-Postgres mit `ops/restore-tests/authelia-compose.test.yml` hochfahren
|
||||
6. optional Dump per `pg_restore -Fc --clean --if-exists --no-owner --no-privileges` einspielen (mit transientem Retry wie im Immich-/Paperless-Test)
|
||||
7. `authelia config validate` mit beiden Configdateien laufen lassen
|
||||
8. `restoretest-authelia` starten und HTTP-Health `http://127.0.0.1:19091/api/health` pollen
|
||||
9. Report unter `/mnt/user/backups/restore-reports/authelia-YYYY-MM-DD.md` schreiben
|
||||
10. Testcontainer stoppen und Restore-Lab bereinigen (`--keep-data` ueberschreibt)
|
||||
|
||||
## Smoke-Test
|
||||
|
||||
Minimal erfolgreich:
|
||||
|
||||
- Borg-Extract der Authelia-Config gelingt
|
||||
- Test-Postgres startet `healthy`
|
||||
- `authelia config validate` laeuft ohne Fehler durch
|
||||
- HTTP `200` auf `/api/health` innerhalb 120 s
|
||||
|
||||
Optional spaeter:
|
||||
|
||||
- vollstaendigen Auth-Flow gegen Test-User aus `users_database.yml` durchspielen
|
||||
- WebAuthn-Endpunkt /api/secondfactor/webauthn pruefen
|
||||
- ForwardAuth-Pfad gegen Mock-Backend testen
|
||||
|
||||
## Bekannte Komplikationen
|
||||
|
||||
| Risiko | Beschreibung | Mitigation |
|
||||
|---|---|---|
|
||||
| Overlay-Konflikt mit Originalkonfiguration | `configuration.yml` definiert ggf. Sections, die das Overlay nicht ueberschreibt | bei `config validate`-Fehler `configuration.yml.original` zum Vergleich pruefen; Overlay erweitern |
|
||||
| SMTP-Startup-Check blockiert Start | Wenn Authelia trotz `disable_startup_check` SMTP probiert | Container-Logs lesen, ggf. Notifier-Block weiter haerten |
|
||||
| Postgres-Schema-Drift nach Major-Update | Authelia migriert Schema beim Start; Dump aus 17er-Cluster kann unter 18er andere Indexe brauchen | Smoke ist DB-Schema-tolerant; bei Validierung Logs auf `migration` pruefen |
|
||||
| identity_validation Block fehlt im Original | Aelteres Authelia-Schema kennt den Block nicht; Overlay fuegt ihn an | Validate-Config Output lesen, ggf. Overlay anpassen |
|
||||
| users_database.yml mit produktiven Hashes | Daten werden ins Restore-Lab kopiert, aber niemals gemountet auf produktive Domain | OK; Testpfad ist isoliert, kein Browser-Zugang ueber LAN |
|
||||
|
||||
## Noch offen vor dem ersten echten Lauf
|
||||
|
||||
- Erstlauf `--what-if` als Plan-Check
|
||||
- Erstlauf `--keep-data` zur Beobachtung von SMTP-Startup-Verhalten
|
||||
- Validate-Config-Output zum Authelia-Schema-Stand pruefen
|
||||
- nach Erfolg: Schedule-Eintrag analog zu Vaultwarden (2. Samstag in geraden Monaten als Vorschlag, damit nicht mit Paperless kollidiert)
|
||||
|
||||
## Status
|
||||
|
||||
- Skript- und Compose-Scaffold abgelegt am 2026-06-02
|
||||
- **noch kein echter Mini-Restore gelaufen** - erster Lauf braucht Operator-Freigabe
|
||||
@@ -0,0 +1,266 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
# Authelia Restore Smoke Test
|
||||
#
|
||||
# Nicht-destruktiver Restore-Smoke-Test fuer Authelia.
|
||||
# - extrahiert die Authelia-Config aus dem produktiven Borg-Archiv
|
||||
# - patcht in einer Restore-Lab-Kopie der configuration.yml die
|
||||
# externen Abhaengigkeiten (storage = lokales Test-Postgres,
|
||||
# notifier = Filesystem-Notifier, identity_validation auf Test-Werte)
|
||||
# - importiert optional den shared-Postgres-Dump fuer Authelia
|
||||
# - validiert die gepatchte Konfiguration mit `authelia config validate`
|
||||
# - startet einen isolierten Authelia-Container ohne Traefik
|
||||
# - prueft den HTTP-Health-Endpunkt
|
||||
# - bereinigt anschliessend
|
||||
#
|
||||
# Produktive Authelia-Container, produktive Postgres-DB, produktive Secrets
|
||||
# und produktiver SMTP-Versand werden NICHT angefasst.
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
. "$SCRIPT_DIR/common.sh"
|
||||
|
||||
WHATIF=0
|
||||
KEEP_DATA=0
|
||||
for arg in "$@"; do
|
||||
case "$arg" in
|
||||
--what-if) WHATIF=1 ;;
|
||||
--keep-data) KEEP_DATA=1 ;;
|
||||
*) echo "Unknown argument: $arg" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
RESTORE_ROOT="/mnt/user/backups/restore-lab/authelia"
|
||||
REPORT_ROOT="/mnt/user/backups/restore-reports"
|
||||
EXTRACT_DIR="$BORG_RESTORE_HOST_ROOT/authelia-extract"
|
||||
COMPOSE_FILE="$SCRIPT_DIR/authelia-compose.test.yml"
|
||||
REPORT_FILE="$REPORT_ROOT/authelia-$(date +%F).md"
|
||||
|
||||
if [ "$WHATIF" -eq 1 ]; then
|
||||
cat <<EOF
|
||||
Authelia restore test
|
||||
Mode: WhatIf
|
||||
RestoreRoot: $RESTORE_ROOT
|
||||
ReportRoot: $REPORT_ROOT
|
||||
Expected Borg source paths:
|
||||
- local/appdata/authelia/config
|
||||
- local/borg-dumps/latest/postgresql17-authelia.dump (optional - wird uebersprungen wenn nicht vorhanden)
|
||||
Planned isolation:
|
||||
- Test-Postgres: postgres:18.4 mit Wegwerf-Credentials
|
||||
- Test-Authelia: authelia/authelia:4.39.20 (Image-Pin wie Produktion)
|
||||
- Wegwerf-Secrets ausschliesslich im Test-Compose
|
||||
- configuration.yml wird im Restore-Lab gepatcht:
|
||||
* storage -> Test-Postgres (kein produktives Postgres erreicht)
|
||||
* notifier -> Filesystem (KEIN SMTP-Versand)
|
||||
* session -> in-memory (kein Redis-Backend noetig)
|
||||
- Test endpoint: 127.0.0.1:19091/api/health (no Traefik, no public domain)
|
||||
Smoke-Test:
|
||||
- authelia config validate gegen gepatchte configuration.yml
|
||||
- HTTP 200 von /api/health
|
||||
EOF
|
||||
exit 0
|
||||
fi
|
||||
|
||||
require_cmd docker
|
||||
require_cmd curl
|
||||
require_path "$BORG_PASSPHRASE_FILE_DEFAULT"
|
||||
require_path "$COMPOSE_FILE"
|
||||
|
||||
RESTORE_SUCCESS=0
|
||||
cleanup() {
|
||||
cleanup_compose "$COMPOSE_FILE"
|
||||
if [ "$RESTORE_SUCCESS" -ne 1 ]; then
|
||||
preserve_on_failure "authelia" "$RESTORE_ROOT"
|
||||
rm -rf "$EXTRACT_DIR"
|
||||
return
|
||||
fi
|
||||
if [ "$KEEP_DATA" -ne 1 ]; then
|
||||
rm -rf "$RESTORE_ROOT"
|
||||
fi
|
||||
rm -rf "$EXTRACT_DIR"
|
||||
}
|
||||
trap cleanup EXIT
|
||||
|
||||
rm -rf "$EXTRACT_DIR" "$RESTORE_ROOT"
|
||||
mkdir -p "$RESTORE_ROOT/config" "$RESTORE_ROOT/postgres" "$RESTORE_ROOT/dumps/latest" "$RESTORE_ROOT/notifier"
|
||||
|
||||
archive="$(latest_archive_name)"
|
||||
repo="$(borg_repo_url)"
|
||||
|
||||
if [ -z "$archive" ] || [ -z "$repo" ]; then
|
||||
echo "Could not resolve Borg repo/archive from borg-ui database" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Stufe 1: Config aus Borg extrahieren
|
||||
borg_extract "/restore/authelia-extract" "local/appdata/authelia/config"
|
||||
if [ ! -d "$EXTRACT_DIR/local/appdata/authelia/config" ]; then
|
||||
echo "Authelia config path missing in Borg archive" >&2
|
||||
exit 1
|
||||
fi
|
||||
cp -a "$EXTRACT_DIR/local/appdata/authelia/config/." "$RESTORE_ROOT/config/"
|
||||
|
||||
# Stufe 2: optionalen Postgres-Dump extrahieren und ggf. einspielen
|
||||
dump_available=0
|
||||
if borg_extract "/restore/authelia-extract" "local/borg-dumps/latest/postgresql17-authelia.dump" 2>/dev/null; then
|
||||
if [ -f "$EXTRACT_DIR/local/borg-dumps/latest/postgresql17-authelia.dump" ]; then
|
||||
mv "$EXTRACT_DIR/local/borg-dumps/latest/postgresql17-authelia.dump" \
|
||||
"$RESTORE_ROOT/dumps/latest/postgresql17-authelia.dump"
|
||||
dump_available=1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Stufe 3: configuration.yml im Restore-Lab gezielt patchen.
|
||||
# Wir ersetzen storage/notifier/session-Blocks durch Test-Definitionen,
|
||||
# damit der Test KEIN produktives Postgres und KEIN echtes SMTP anspricht.
|
||||
CONFIG_FILE="$RESTORE_ROOT/config/configuration.yml"
|
||||
if [ ! -f "$CONFIG_FILE" ]; then
|
||||
echo "configuration.yml missing in restored config dir" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Sichere Originalkopie fuer Diff/Diagnose
|
||||
cp "$CONFIG_FILE" "$CONFIG_FILE.original"
|
||||
|
||||
# Schreibe Drop-In fuer Test-Backends. Authelia 4.39 laedt mehrere
|
||||
# Configdateien ueber wiederholte --config-Argumente; einfacher fuer Smoke
|
||||
# ist ein gezielter Overlay-File, der Test-Werte vorgibt.
|
||||
cat > "$RESTORE_ROOT/config/configuration.test-overlay.yml" <<'YAML'
|
||||
# Test-Overlay nur fuer Restore-Smoke. Wird als zweite --config-Datei
|
||||
# zusaetzlich zur restaurierten configuration.yml geladen und ueberschreibt
|
||||
# externe Abhaengigkeiten.
|
||||
|
||||
storage:
|
||||
postgres:
|
||||
address: tcp://restoretest-authelia-postgres:5432
|
||||
database: authelia
|
||||
username: authelia
|
||||
# Passwort kommt ueber AUTHELIA_STORAGE_POSTGRES_PASSWORD ENV
|
||||
|
||||
notifier:
|
||||
disable_startup_check: true
|
||||
filesystem:
|
||||
filename: /config/notifier/notifications.txt
|
||||
|
||||
session:
|
||||
cookies:
|
||||
- name: authelia_session_restoretest
|
||||
domain: kaleschke.info
|
||||
authelia_url: http://127.0.0.1:19091
|
||||
default_redirection_url: http://127.0.0.1:19091
|
||||
expiration: 1h
|
||||
inactivity: 5m
|
||||
|
||||
identity_validation:
|
||||
reset_password:
|
||||
jwt_secret: restoretest-authelia-reset-password-jwt-secret-placeholder-64bytes
|
||||
jwt_lifespan: 5m
|
||||
jwt_algorithm: HS256
|
||||
YAML
|
||||
|
||||
mkdir -p "$RESTORE_ROOT/config/notifier"
|
||||
chmod -R a+rwX "$RESTORE_ROOT/config/notifier"
|
||||
|
||||
# Stufe 4: Test-Postgres hochfahren
|
||||
docker compose -f "$COMPOSE_FILE" up -d restoretest-authelia-postgres >/dev/null
|
||||
until docker exec restoretest-authelia-postgres pg_isready -U authelia -d authelia >/dev/null 2>&1; do
|
||||
sleep 2
|
||||
done
|
||||
|
||||
# Stufe 5: optional Dump einspielen
|
||||
dump_status="skipped (no dump in archive)"
|
||||
if [ "$dump_available" -eq 1 ]; then
|
||||
restore_ok=0
|
||||
for attempt in $(seq 1 12); do
|
||||
if docker exec -i restoretest-authelia-postgres \
|
||||
pg_restore -U authelia -d authelia --clean --if-exists --no-owner --no-privileges \
|
||||
< "$RESTORE_ROOT/dumps/latest/postgresql17-authelia.dump" 2>/tmp/authelia-pg-restore.err; then
|
||||
restore_ok=1
|
||||
break
|
||||
fi
|
||||
if grep -qiE "starting up|shutting down|connection refused|database .* does not exist" /tmp/authelia-pg-restore.err; then
|
||||
sleep 5
|
||||
continue
|
||||
fi
|
||||
cat /tmp/authelia-pg-restore.err >&2
|
||||
exit 1
|
||||
done
|
||||
if [ "$restore_ok" -ne 1 ]; then
|
||||
cat /tmp/authelia-pg-restore.err >&2
|
||||
exit 1
|
||||
fi
|
||||
dump_status="restored"
|
||||
fi
|
||||
|
||||
# Stufe 6: config validate im Container-Kontext, gegen restauriertes + overlay
|
||||
validate_status="ok"
|
||||
if ! docker run --rm \
|
||||
-e AUTHELIA_JWT_SECRET=restoretest-authelia-jwt-secret-placeholder-32bytes \
|
||||
-e AUTHELIA_SESSION_SECRET=restoretest-authelia-session-secret-placeholder-32 \
|
||||
-e AUTHELIA_STORAGE_ENCRYPTION_KEY=restoretest-authelia-storage-enc-key-placeholder-32 \
|
||||
-e AUTHELIA_STORAGE_POSTGRES_PASSWORD=restoretest-authelia-db \
|
||||
-e AUTHELIA_NOTIFIER_SMTP_PASSWORD=restoretest-authelia-smtp-placeholder \
|
||||
-v "$RESTORE_ROOT/config:/config" \
|
||||
authelia/authelia:4.39.20@sha256:1b363e9279e742397966333f364e0876ae02bf5c876de73e83af6d48c57ff51b \
|
||||
authelia config validate --config /config/configuration.yml --config /config/configuration.test-overlay.yml \
|
||||
>/tmp/authelia-validate.log 2>&1; then
|
||||
validate_status="failed"
|
||||
cat /tmp/authelia-validate.log >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Stufe 7: Authelia-Container starten. Das Compose setzt wiederholte
|
||||
# --config-Argumente, sodass das Test-Overlay zusaetzlich geladen wird; die
|
||||
# zweite Datei gewinnt bei Konflikten und ersetzt storage/notifier/session.
|
||||
docker compose -f "$COMPOSE_FILE" up -d restoretest-authelia >/dev/null
|
||||
|
||||
http_status=""
|
||||
for _ in $(seq 1 60); do
|
||||
http_status="$(curl -s -o /tmp/authelia-body.html -w '%{http_code}' \
|
||||
http://127.0.0.1:19091/api/health || true)"
|
||||
if [ "$http_status" = "200" ]; then
|
||||
break
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
|
||||
if [ "$http_status" != "200" ]; then
|
||||
echo "Authelia HTTP health failed: status=$http_status" >&2
|
||||
docker logs --tail 120 restoretest-authelia >&2 || true
|
||||
exit 1
|
||||
fi
|
||||
|
||||
write_report "$REPORT_FILE" <<EOF
|
||||
# Authelia Restore Test Report - $(date +%F)
|
||||
|
||||
- Service: \`authelia\`
|
||||
- Source repo: \`$repo\`
|
||||
- Archive: \`$archive\`
|
||||
- Restore root: \`$RESTORE_ROOT\`
|
||||
- Test containers:
|
||||
- \`restoretest-authelia\`
|
||||
- \`restoretest-authelia-postgres\`
|
||||
- Test endpoint: \`http://127.0.0.1:19091/api/health\`
|
||||
- Result: \`SUCCESS\`
|
||||
|
||||
## Checks
|
||||
|
||||
- Borg extract of config: \`ok\`
|
||||
- Borg extract of dump: \`$dump_status\`
|
||||
- configuration.yml present: \`ok\`
|
||||
- Test-overlay (storage/notifier/session) written: \`ok\`
|
||||
- \`authelia config validate\`: \`$validate_status\`
|
||||
- HTTP /api/health status: \`$http_status\`
|
||||
|
||||
## Notes
|
||||
|
||||
- Test ran without Traefik and without the productive domain \`auth.kaleschke.info\`.
|
||||
- Productive Authelia secrets under \`/mnt/user/appdata/secrets/authelia_*.txt\` were NOT mounted.
|
||||
- Notifier was forced to filesystem (\`/config/notifier/notifications.txt\`); no SMTP call to GMX.
|
||||
- Storage forced to isolated test postgres; productive shared PostgreSQL 18 was NOT touched.
|
||||
- Test data was cleaned after success: \`$([ "$KEEP_DATA" -eq 1 ] && echo no || echo yes)\`
|
||||
- Restore-Quelle Dump: \`local/borg-dumps/latest/postgresql17-authelia.dump\` (optional, wenn im Archiv).
|
||||
EOF
|
||||
|
||||
RESTORE_SUCCESS=1
|
||||
echo "Authelia restore test ok -> $REPORT_FILE"
|
||||
@@ -0,0 +1,83 @@
|
||||
# Authelia Restore Runbook
|
||||
|
||||
## Status
|
||||
|
||||
Skript und Test-Compose sind als **Scaffold** abgelegt. Erstlauf steht noch aus und braucht Operator-Freigabe. Authelia ist Tier-1-kritisch, deshalb startet dieser Test bewusst konservativ: Smoke-Test prueft nur Config-Validate + HTTP-Health, kein vollstaendiger Auth-Flow.
|
||||
|
||||
## Vorbedingungen
|
||||
|
||||
- Borg-Quelle ist verfuegbar
|
||||
- `borg-ui`-Container laeuft
|
||||
- Borg-Passphrase-Datei vorhanden: `/mnt/user/appdata/secrets/borg_repo_passphrase.txt`
|
||||
- `borg-ui` mountet die Passphrase im Container als `/local/secrets/borg_repo_passphrase.txt`
|
||||
- aktuelles Borg-Archiv enthaelt `local/appdata/authelia/config`
|
||||
- optional: `local/borg-dumps/latest/postgresql17-authelia.dump`
|
||||
- Testpfade unter `/mnt/user/backups/restore-lab/` und `/mnt/user/backups/restore-reports/` sind freigegeben
|
||||
- Port `127.0.0.1:19091` frei
|
||||
- freier Speicher unter `/mnt/user/backups/restore-lab/authelia` (~200 MB reichen)
|
||||
|
||||
## Bestaetigter Host-Stand (Soll)
|
||||
|
||||
- produktiver Authelia-Container: `authelia` mit Image `authelia/authelia:4.39.20@sha256:1b363e9279e742397966333f364e0876ae02bf5c876de73e83af6d48c57ff51b`
|
||||
- produktiver Config-Pfad: `/mnt/user/appdata/authelia/config`
|
||||
- produktive Secrets: `/mnt/user/appdata/secrets/authelia_*.txt` (werden vom Test **nicht** gebraucht)
|
||||
- produktive Storage: shared PostgreSQL 18 (wird vom Test **nicht** angesprochen)
|
||||
|
||||
## Erster Lauf - trockene Variante
|
||||
|
||||
```bash
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/authelia-restore-test.sh --what-if
|
||||
```
|
||||
|
||||
Erwartete Ausgabe: nur Plan-Output, kein Docker-Start, kein Borg-Extract.
|
||||
|
||||
## Erster Lauf - echter Test (Operator-freigegeben)
|
||||
|
||||
```bash
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/authelia-restore-test.sh --keep-data
|
||||
```
|
||||
|
||||
Bei Erfolg:
|
||||
|
||||
- Report unter `/mnt/user/backups/restore-reports/authelia-YYYY-MM-DD.md`
|
||||
- Restore-Lab-Daten bleiben mit `--keep-data` erhalten
|
||||
- ohne `--keep-data` wird das Restore-Lab geloescht; bei Fehler wird es nach `/mnt/user/backups/restore-lab/_failed/authelia-...` verschoben
|
||||
|
||||
## Smoke-Test-Pruefungen
|
||||
|
||||
Minimal erwartet im Report:
|
||||
|
||||
- Borg extract of config: `ok`
|
||||
- Test-Postgres healthy
|
||||
- `authelia config validate`: `ok`
|
||||
- HTTP /api/health status: `200`
|
||||
|
||||
## Fehlerfaelle
|
||||
|
||||
| Symptom | Ursache | Massnahme |
|
||||
|---|---|---|
|
||||
| `config validate` failt mit `notifier` Block | Original `configuration.yml` ueberschreibt das Overlay; Authelia merged Maps nicht | Originalkonfig pruefen unter `restore-lab/authelia/config/configuration.yml.original`; Overlay-Block oder Reihenfolge der `--config`-Argumente anpassen |
|
||||
| `config validate` failt mit `session.domain` | aelteres/neueres Schema | Overlay `session:`-Block an reales Authelia-Schema anpassen |
|
||||
| HTTP-Timeout 120 s | Authelia haengt in Postgres-Schema-Migration | `docker logs --tail 200 restoretest-authelia` lesen, ggf. Wartezeit erhoehen |
|
||||
| SMTP-Connect im Log | Notifier-Override greift nicht | `disable_startup_check: true` und Filesystem-Pfad im Overlay pruefen |
|
||||
| `pg_restore` failt mit Schema-Drift | Dump aus 17er-Cluster, 18er Image braucht andere Initialisierung | Schritt als optional dokumentiert; Smoke ohne Dump akzeptieren und Issue nachverfolgen |
|
||||
|
||||
## Cleanup
|
||||
|
||||
- bei Erfolg ohne `--keep-data`: `rm -rf /mnt/user/backups/restore-lab/authelia` und Extract-Cache
|
||||
- bei Fehler: Datenpfad wird via `preserve_on_failure` nach `/mnt/user/backups/restore-lab/_failed/authelia-...` umbenannt
|
||||
|
||||
Produktive Authelia-Container, produktive Secrets, produktive Postgres-DB und produktiver SMTP-Account werden niemals beruehrt.
|
||||
|
||||
## Schedule
|
||||
|
||||
Aktuell nicht im automatischen Schedule. Vorschlag nach erstem erfolgreichen Lauf: zweimonatlich (2. Samstag in geraden Monaten), damit nicht mit Paperless kollidierend.
|
||||
|
||||
## Festgelegte Entscheidungen
|
||||
|
||||
- Test-Compose nutzt denselben Image-Digest wie Produktion.
|
||||
- Wegwerf-Secrets ausschliesslich im Test-Compose; niemals produktive Authelia-Secrets einsetzen.
|
||||
- Test-Postgres ist isoliert; produktive shared PostgreSQL 18 wird nicht angesprochen.
|
||||
- Notifier wird auf Filesystem umgebogen; KEIN echter SMTP-Versand.
|
||||
- Test-Port nur auf `127.0.0.1:19091`, keine LAN-/Traefik-Anbindung.
|
||||
- Borg-Passphrase wird aus Host-Secret-Datei gelesen und nirgendwo geloggt.
|
||||
@@ -25,6 +25,65 @@ check_file_age_days() {
|
||||
echo $(( (now_epoch - mtime) / 86400 ))
|
||||
}
|
||||
|
||||
# pg_restore --list als billiger Header-Check fuer Custom-Format-Dumps;
|
||||
# erkennt Korruption, die mit reinem "exists+nonempty" durchrutscht. Wir
|
||||
# brauchen kein laufendes Postgres; der Check liest nur die Toc-Section.
|
||||
PG_DUMPS="postgresql17-paperless.dump postgresql17-mailarchiver.dump postgresql17-authelia.dump mealie.dump immich.dump nextcloud.dump"
|
||||
is_pg_custom_dump() {
|
||||
case " $PG_DUMPS " in *" $1 "*) return 0;; *) return 1;; esac
|
||||
}
|
||||
|
||||
pg_header_ok() {
|
||||
local path="$1"
|
||||
if ! command -v pg_restore >/dev/null 2>&1; then
|
||||
# ohne Host-pg_restore: in laufendem Postgres-Container probieren
|
||||
if command -v docker >/dev/null 2>&1 && docker inspect postgresql17 >/dev/null 2>&1; then
|
||||
docker exec -i postgresql17 pg_restore --list </"$path" >/dev/null 2>&1 && return 0
|
||||
fi
|
||||
return 2 # nicht pruefbar
|
||||
fi
|
||||
pg_restore --list "$path" >/dev/null 2>&1
|
||||
}
|
||||
|
||||
check_pg_header() {
|
||||
local dump="$1"
|
||||
local path="$2"
|
||||
local age="$3"
|
||||
local missing_mode="${4:-critical}"
|
||||
|
||||
if [ ! -f "$path" ]; then
|
||||
if [ "$missing_mode" = "optional" ]; then
|
||||
info+=("DUMP_OPTIONAL_MISSING $dump")
|
||||
else
|
||||
critical+=("DUMP_MISSING $dump")
|
||||
fi
|
||||
return
|
||||
fi
|
||||
if [ ! -s "$path" ]; then
|
||||
critical+=("DUMP_EMPTY $dump")
|
||||
return
|
||||
fi
|
||||
if [ "$age" -gt "$MAX_DUMP_AGE_HOURS" ]; then
|
||||
if [ "$missing_mode" = "optional" ]; then
|
||||
warnings+=("DUMP_OPTIONAL_STALE $dump age=${age}h")
|
||||
else
|
||||
critical+=("DUMP_STALE $dump age=${age}h")
|
||||
fi
|
||||
return
|
||||
fi
|
||||
|
||||
if pg_header_ok "$path"; then
|
||||
rc=0
|
||||
else
|
||||
rc=$?
|
||||
fi
|
||||
case "$rc" in
|
||||
0) info+=("DUMP_OK $dump age=${age}h header=ok") ;;
|
||||
1) critical+=("DUMP_HEADER_INVALID $dump (pg_restore --list failed)") ;;
|
||||
2) info+=("DUMP_OK $dump age=${age}h header=unchecked") ;;
|
||||
esac
|
||||
}
|
||||
|
||||
for dump in \
|
||||
postgresql17-paperless.dump \
|
||||
postgresql17-mailarchiver.dump \
|
||||
@@ -48,11 +107,24 @@ for dump in \
|
||||
age="$(check_file_age_hours "$path")"
|
||||
if [ "$age" -gt "$MAX_DUMP_AGE_HOURS" ]; then
|
||||
critical+=("DUMP_STALE $dump age=${age}h")
|
||||
continue
|
||||
fi
|
||||
|
||||
if is_pg_custom_dump "$dump"; then
|
||||
check_pg_header "$dump" "$path" "$age"
|
||||
else
|
||||
info+=("DUMP_OK $dump age=${age}h")
|
||||
fi
|
||||
done
|
||||
|
||||
optional_dump="postgresql17-authelia.dump"
|
||||
optional_path="$DUMP_ROOT/$optional_dump"
|
||||
optional_age=0
|
||||
if [ -f "$optional_path" ]; then
|
||||
optional_age="$(check_file_age_hours "$optional_path")"
|
||||
fi
|
||||
check_pg_header "$optional_dump" "$optional_path" "$optional_age" optional
|
||||
|
||||
for service in vaultwarden gitea paperless; do
|
||||
if [ ! -d "$REPORT_ROOT" ]; then
|
||||
warnings+=("REPORT_ROOT_MISSING $REPORT_ROOT")
|
||||
|
||||
@@ -20,7 +20,28 @@ require_path() {
|
||||
}
|
||||
}
|
||||
|
||||
require_borg_container() {
|
||||
docker inspect "$BORG_CONTAINER" >/dev/null 2>&1 || {
|
||||
echo "Missing Borg container: $BORG_CONTAINER" >&2
|
||||
exit 1
|
||||
}
|
||||
[ "$(docker inspect -f '{{.State.Running}}' "$BORG_CONTAINER" 2>/dev/null)" = "true" ] || {
|
||||
echo "Borg container is not running: $BORG_CONTAINER" >&2
|
||||
exit 1
|
||||
}
|
||||
docker exec "$BORG_CONTAINER" test -r /data/borg.db >/dev/null 2>&1 || {
|
||||
echo "Missing borg-ui database in container: $BORG_CONTAINER:/data/borg.db" >&2
|
||||
exit 1
|
||||
}
|
||||
docker exec "$BORG_CONTAINER" test -r /local/secrets/borg_repo_passphrase.txt >/dev/null 2>&1 || {
|
||||
echo "Missing Borg passphrase in container: $BORG_CONTAINER:/local/secrets/borg_repo_passphrase.txt" >&2
|
||||
echo "Host path exists, but borg-ui must mount it as /local/secrets/borg_repo_passphrase.txt." >&2
|
||||
exit 1
|
||||
}
|
||||
}
|
||||
|
||||
latest_archive_name() {
|
||||
require_borg_container
|
||||
docker exec -i "$BORG_CONTAINER" python3 - <<'PY'
|
||||
import sqlite3
|
||||
conn = sqlite3.connect('/data/borg.db')
|
||||
@@ -34,6 +55,7 @@ PY
|
||||
}
|
||||
|
||||
borg_repo_url() {
|
||||
require_borg_container
|
||||
docker exec -i "$BORG_CONTAINER" python3 - <<'PY'
|
||||
import sqlite3
|
||||
conn = sqlite3.connect('/data/borg.db')
|
||||
@@ -50,6 +72,7 @@ borg_extract() {
|
||||
local extract_dir="$1"
|
||||
shift
|
||||
local paths=("$@")
|
||||
require_borg_container
|
||||
docker exec -i "$BORG_CONTAINER" python3 - "$extract_dir" "${paths[@]}" <<'PY'
|
||||
import os, sys, subprocess
|
||||
extract_dir = sys.argv[1]
|
||||
@@ -88,3 +111,22 @@ cleanup_compose() {
|
||||
docker compose -f "$compose_file" down >/dev/null 2>&1 || true
|
||||
fi
|
||||
}
|
||||
|
||||
# Hilfsfunktion: bei Fehler-Exit Restore-Lab-Pfad nicht loeschen, sondern in
|
||||
# einen `_failed/<service>-<date>-<pid>`-Pfad umbenennen, damit Post-Mortem
|
||||
# moeglich bleibt. Aufrufer setzt vor Erfolg `RESTORE_SUCCESS=1`.
|
||||
RESTORE_FAILED_ROOT="${RESTORE_FAILED_ROOT:-/mnt/user/backups/restore-lab/_failed}"
|
||||
preserve_on_failure() {
|
||||
local service="$1"
|
||||
local path="$2"
|
||||
if [ ! -e "$path" ]; then
|
||||
return 0
|
||||
fi
|
||||
mkdir -p "$RESTORE_FAILED_ROOT"
|
||||
local target="$RESTORE_FAILED_ROOT/${service}-$(date +%F)-$$"
|
||||
if mv "$path" "$target" 2>/dev/null; then
|
||||
echo "preserved failed restore data: $target" >&2
|
||||
else
|
||||
echo "failed to preserve restore data: $path -> $target" >&2
|
||||
fi
|
||||
}
|
||||
|
||||
@@ -37,8 +37,14 @@ require_cmd curl
|
||||
require_path "$BORG_PASSPHRASE_FILE_DEFAULT"
|
||||
require_path "$COMPOSE_FILE"
|
||||
|
||||
RESTORE_SUCCESS=0
|
||||
cleanup() {
|
||||
cleanup_compose "$COMPOSE_FILE"
|
||||
if [ "$RESTORE_SUCCESS" -ne 1 ]; then
|
||||
preserve_on_failure "gitea" "$RESTORE_ROOT"
|
||||
rm -rf "$EXTRACT_DIR"
|
||||
return
|
||||
fi
|
||||
if [ "$KEEP_DATA" -ne 1 ]; then
|
||||
rm -rf "$DATA_DIR"
|
||||
fi
|
||||
@@ -94,4 +100,5 @@ write_report "$REPORT_FILE" <<EOF
|
||||
- Test data was cleaned after success: \`$([ "$KEEP_DATA" -eq 1 ] && echo no || echo yes)\`
|
||||
EOF
|
||||
|
||||
RESTORE_SUCCESS=1
|
||||
echo "Gitea restore test ok -> $REPORT_FILE"
|
||||
|
||||
@@ -62,7 +62,7 @@ Wenn das Archiv den Pfad anders ablegt, zuerst mit `borg list "$BORG_REPO" "::AR
|
||||
3. Testcontainer starten
|
||||
|
||||
```bash
|
||||
docker compose -f /mnt/user/services/homelab/ops/restore-tests/gitea-compose.test.yml up -d
|
||||
docker compose -f /mnt/user/services/homelab-infra/ops/restore-tests/gitea-compose.test.yml up -d
|
||||
```
|
||||
|
||||
4. Smoke-Test
|
||||
@@ -83,7 +83,7 @@ Minimal erfolgreich:
|
||||
5. Testcontainer wieder stoppen
|
||||
|
||||
```bash
|
||||
docker compose -f /mnt/user/services/homelab/ops/restore-tests/gitea-compose.test.yml down
|
||||
docker compose -f /mnt/user/services/homelab-infra/ops/restore-tests/gitea-compose.test.yml down
|
||||
```
|
||||
|
||||
6. Report schreiben
|
||||
|
||||
@@ -64,8 +64,14 @@ require_cmd curl
|
||||
require_path "$BORG_PASSPHRASE_FILE_DEFAULT"
|
||||
require_path "$COMPOSE_FILE"
|
||||
|
||||
RESTORE_SUCCESS=0
|
||||
cleanup() {
|
||||
cleanup_compose "$COMPOSE_FILE"
|
||||
if [ "$RESTORE_SUCCESS" -ne 1 ]; then
|
||||
preserve_on_failure "immich" "$RESTORE_ROOT"
|
||||
rm -rf "$EXTRACT_DIR"
|
||||
return
|
||||
fi
|
||||
if [ "$KEEP_DATA" -ne 1 ]; then
|
||||
rm -rf "$RESTORE_ROOT"
|
||||
fi
|
||||
@@ -244,4 +250,5 @@ write_report "$REPORT_FILE" <<EOF
|
||||
- Restore-Quelle Dump: \`local/borg-dumps/latest/immich.dump\` aus aktuellem Borg-Archiv.
|
||||
EOF
|
||||
|
||||
RESTORE_SUCCESS=1
|
||||
echo "Immich restore test ok -> $REPORT_FILE"
|
||||
|
||||
@@ -53,8 +53,13 @@ fi
|
||||
require_cmd docker
|
||||
require_path "$COMPOSE_FILE"
|
||||
|
||||
RESTORE_SUCCESS=0
|
||||
cleanup() {
|
||||
docker compose -f "$COMPOSE_FILE" -p "$PROJECT_NAME" down -v >/dev/null 2>&1 || true
|
||||
if [ "$RESTORE_SUCCESS" -ne 1 ]; then
|
||||
preserve_on_failure "komodo-bootstrap" "$RESTORE_ROOT"
|
||||
return
|
||||
fi
|
||||
if [ "$KEEP_DATA" -ne 1 ]; then
|
||||
rm -rf "$RESTORE_ROOT"
|
||||
fi
|
||||
@@ -132,4 +137,5 @@ write_report "$REPORT_FILE" <<EOF
|
||||
- Test-Daten wurden \`$([ "$KEEP_DATA" -eq 1 ] && echo behalten || echo bereinigt)\`.
|
||||
EOF
|
||||
|
||||
RESTORE_SUCCESS=1
|
||||
echo "Komodo bootstrap trockenlauf ok -> $REPORT_FILE"
|
||||
|
||||
@@ -41,8 +41,14 @@ require_cmd curl
|
||||
require_path "$BORG_PASSPHRASE_FILE_DEFAULT"
|
||||
require_path "$COMPOSE_FILE"
|
||||
|
||||
RESTORE_SUCCESS=0
|
||||
cleanup() {
|
||||
cleanup_compose "$COMPOSE_FILE"
|
||||
if [ "$RESTORE_SUCCESS" -ne 1 ]; then
|
||||
preserve_on_failure "paperless" "$RESTORE_ROOT"
|
||||
rm -rf "$EXTRACT_DIR"
|
||||
return
|
||||
fi
|
||||
if [ "$KEEP_DATA" -ne 1 ]; then
|
||||
rm -rf "$RESTORE_ROOT"
|
||||
fi
|
||||
@@ -70,7 +76,30 @@ mv "$EXTRACT_DIR/local/borg-dumps/latest/postgresql17-paperless.dump" "$RESTORE_
|
||||
|
||||
docker compose -f "$COMPOSE_FILE" up -d restoretest-paperless-postgres restoretest-paperless-redis >/dev/null
|
||||
until docker exec restoretest-paperless-postgres pg_isready -U paperless -d paperless >/dev/null 2>&1; do sleep 2; done
|
||||
cat "$RESTORE_ROOT/dumps/latest/postgresql17-paperless.dump" | docker exec -i restoretest-paperless-postgres pg_restore -U paperless -d paperless --clean --if-exists --no-owner --no-privileges
|
||||
|
||||
# Postgres-Entrypoint kann kurz nach "ready" noch vom Init- auf den finalen
|
||||
# Server wechseln. pg_restore toleriert transiente Start-/Shutdown-Fehler und
|
||||
# retried; harte Fehler (z. B. Dump-Korruption) brechen wie bisher ab.
|
||||
restore_ok=0
|
||||
for attempt in $(seq 1 12); do
|
||||
if docker exec -i restoretest-paperless-postgres \
|
||||
pg_restore -U paperless -d paperless --clean --if-exists --no-owner --no-privileges \
|
||||
< "$RESTORE_ROOT/dumps/latest/postgresql17-paperless.dump" 2>/tmp/paperless-pg-restore.err; then
|
||||
restore_ok=1
|
||||
break
|
||||
fi
|
||||
if grep -qiE "starting up|shutting down|connection refused|database .* does not exist" /tmp/paperless-pg-restore.err; then
|
||||
sleep 5
|
||||
continue
|
||||
fi
|
||||
cat /tmp/paperless-pg-restore.err >&2
|
||||
exit 1
|
||||
done
|
||||
|
||||
if [ "$restore_ok" -ne 1 ]; then
|
||||
cat /tmp/paperless-pg-restore.err >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
docker compose -f "$COMPOSE_FILE" up -d restoretest-paperless >/dev/null
|
||||
sleep 12
|
||||
@@ -110,4 +139,5 @@ write_report "$REPORT_FILE" <<EOF
|
||||
- Test data was cleaned after success: \`$([ "$KEEP_DATA" -eq 1 ] && echo no || echo yes)\`
|
||||
EOF
|
||||
|
||||
RESTORE_SUCCESS=1
|
||||
echo "Paperless restore test ok -> $REPORT_FILE"
|
||||
|
||||
@@ -66,7 +66,7 @@ mv /mnt/user/backups/restore-lab/paperless/local/paperless/consume /mnt/user/bac
|
||||
3. Test-Postgres und Test-Redis starten
|
||||
|
||||
```bash
|
||||
docker compose -f /mnt/user/services/homelab/ops/restore-tests/paperless-compose.test.yml up -d restoretest-paperless-postgres restoretest-paperless-redis
|
||||
docker compose -f /mnt/user/services/homelab-infra/ops/restore-tests/paperless-compose.test.yml up -d restoretest-paperless-postgres restoretest-paperless-redis
|
||||
```
|
||||
|
||||
4. Dump in Test-Postgres importieren
|
||||
@@ -78,7 +78,7 @@ docker exec -i restoretest-paperless-postgres pg_restore -U paperless -d paperle
|
||||
5. Testinstanz starten
|
||||
|
||||
```bash
|
||||
docker compose -f /mnt/user/services/homelab/ops/restore-tests/paperless-compose.test.yml up -d restoretest-paperless
|
||||
docker compose -f /mnt/user/services/homelab-infra/ops/restore-tests/paperless-compose.test.yml up -d restoretest-paperless
|
||||
```
|
||||
|
||||
6. Smoke-Test
|
||||
@@ -98,7 +98,7 @@ Minimal erfolgreich:
|
||||
7. Testcontainer wieder stoppen
|
||||
|
||||
```bash
|
||||
docker compose -f /mnt/user/services/homelab/ops/restore-tests/paperless-compose.test.yml down
|
||||
docker compose -f /mnt/user/services/homelab-infra/ops/restore-tests/paperless-compose.test.yml down
|
||||
```
|
||||
|
||||
8. Testdaten nach erfolgreichem Lauf bereinigen
|
||||
|
||||
@@ -34,8 +34,20 @@ case "$MODE" in
|
||||
fi
|
||||
exec "$SCRIPT_DIR/immich-restore-test.sh"
|
||||
;;
|
||||
authelia)
|
||||
if [ "$WHATIF" = "--what-if" ]; then
|
||||
exec "$SCRIPT_DIR/authelia-restore-test.sh" --what-if
|
||||
fi
|
||||
exec "$SCRIPT_DIR/authelia-restore-test.sh"
|
||||
;;
|
||||
komodo-bootstrap)
|
||||
if [ "$WHATIF" = "--what-if" ]; then
|
||||
exec "$SCRIPT_DIR/komodo-bootstrap-test.sh" --what-if
|
||||
fi
|
||||
exec "$SCRIPT_DIR/komodo-bootstrap-test.sh"
|
||||
;;
|
||||
*)
|
||||
echo "Usage: $0 {freshness|vaultwarden|gitea|paperless|immich} [--what-if]" >&2
|
||||
echo "Usage: $0 {freshness|vaultwarden|gitea|paperless|immich|authelia|komodo-bootstrap} [--what-if]" >&2
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
@@ -7,24 +7,29 @@ SUCCESS_TOPIC="${2:-${RESTORE_SUCCESS_TOPIC:-homelab-info}}"
|
||||
FAILURE_TOPIC="${RESTORE_FAILURE_TOPIC:-homelab-alerts}"
|
||||
|
||||
if [ -z "$MODE" ]; then
|
||||
echo "Usage: $0 <freshness|vaultwarden|gitea|paperless|immich> [success_topic]" >&2
|
||||
echo "Usage: $0 <freshness|vaultwarden|gitea|paperless|immich|authelia|komodo-bootstrap> [success_topic]" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
REPORT_ROOT="/mnt/user/backups/restore-reports"
|
||||
REPORT_FILE="$REPORT_ROOT/${MODE}-$(date +%F).md"
|
||||
WRAPPER_LOG="$REPORT_ROOT/_wrapper-${MODE}-$(date +%F).log"
|
||||
|
||||
mkdir -p "$REPORT_ROOT"
|
||||
|
||||
echo "Running restore job: $MODE"
|
||||
echo "Report target: $REPORT_FILE"
|
||||
echo "Inner report (written by restore script): $REPORT_FILE"
|
||||
echo "Wrapper log (stdout/stderr of dispatcher): $WRAPPER_LOG"
|
||||
|
||||
if "$SCRIPT_DIR/run-restore-checks.sh" "$MODE" > "$REPORT_FILE"; then
|
||||
# Der Restore-Job schreibt seinen Markdown-Report selbst nach $REPORT_FILE.
|
||||
# Wir leiten stdout/stderr in eine separate Wrapper-Log-Datei, damit hier
|
||||
# kein zweiter Schreiber denselben Pfad ueberschreibt.
|
||||
if "$SCRIPT_DIR/run-restore-checks.sh" "$MODE" >"$WRAPPER_LOG" 2>&1; then
|
||||
echo "Restore job succeeded, sending ntfy..."
|
||||
"$SCRIPT_DIR/send-ntfy.sh" "$SUCCESS_TOPIC" "Restore job ok: $MODE" "Restore job succeeded. Report: $REPORT_FILE" default || true
|
||||
echo "Done"
|
||||
else
|
||||
echo "Restore job failed, sending ntfy..."
|
||||
"$SCRIPT_DIR/send-ntfy.sh" "$FAILURE_TOPIC" "Restore job failed: $MODE" "Restore job failed. Report: $REPORT_FILE" high || true
|
||||
"$SCRIPT_DIR/send-ntfy.sh" "$FAILURE_TOPIC" "Restore job failed: $MODE" "Restore job failed. Wrapper log: $WRAPPER_LOG (Report if written: $REPORT_FILE)" high || true
|
||||
exit 1
|
||||
fi
|
||||
|
||||
@@ -46,6 +46,8 @@ Quartals-Belegung:
|
||||
Bestaetigte Mini-Restores: Vaultwarden, Gitea und Paperless am 2026-05-07;
|
||||
Immich am 2026-05-27; Paperless erneut am 2026-05-31.
|
||||
|
||||
Authelia: Scaffold am 2026-06-02 abgelegt, **erster echter Lauf noch offen**. Vorgeschlagener Schedule-Slot nach Erstlauf: 2. Samstag in geraden Monaten 07:30 (kollisionsfrei zu Paperless).
|
||||
|
||||
## Konkreter Kalender
|
||||
|
||||
- Jeden Montag, 06:30:
|
||||
@@ -65,24 +67,28 @@ Immich am 2026-05-27; Paperless erneut am 2026-05-31.
|
||||
|
||||
## Unraid User Scripts Cron
|
||||
|
||||
| Script | Cron | Bedeutung |
|
||||
|---|---|---|
|
||||
| `restore-freshness-weekly` | `30 6 * * 1` | jeden Montag 06:30 |
|
||||
| `restore-vaultwarden-monthly` | `0 7 1-7 * 6` | erster Samstag im Monat 07:00 |
|
||||
| `restore-gitea-monthly` | `15 7 15-21 * 6` | dritter Samstag im Monat 07:15 |
|
||||
| `restore-paperless-bimonthly` | `0 8 8-14 1,3,5,7,9,11 *` | zweiter Samstag in ungeraden Monaten 08:00 |
|
||||
| `restore-immich-quarterly` | `30 8 8-14 2,5,8,11 0` | zweiter Sonntag in Feb/Mai/Aug/Nov 08:30 |
|
||||
| `monthly-random-restore` | `0 9 1 * *` | erster Kalendertag im Monat 09:00 |
|
||||
Vixie-Cron (Unraid) verknuepft `day-of-month` und `day-of-week` mit **OR**, sobald beide gesetzt sind. "n-ter Samstag im Monat" laesst sich deshalb nicht direkt im Cron-Ausdruck ausdruecken. Wir triggern stattdessen an **jedem** Samstag/Sonntag und filtern den Monatstag im User-Script per Shell-Guard.
|
||||
|
||||
| Script | Cron | Shell-Guard (zusaetzlich) | Bedeutung |
|
||||
|---|---|---|---|
|
||||
| `restore-freshness-weekly` | `30 6 * * 1` | - | jeden Montag 06:30 |
|
||||
| `restore-vaultwarden-monthly` | `0 7 * * 6` | `[ "$(date +%-d)" -le 7 ]` | erster Samstag im Monat 07:00 |
|
||||
| `restore-gitea-monthly` | `15 7 * * 6` | `d=$(date +%-d); [ "$d" -ge 15 ] && [ "$d" -le 21 ]` | dritter Samstag im Monat 07:15 |
|
||||
| `restore-paperless-bimonthly` | `0 8 * * 6` | `m=$(date +%-m); d=$(date +%-d); case "$m" in 1\|3\|5\|7\|9\|11) [ "$d" -ge 8 ] && [ "$d" -le 14 ];; *) false;; esac` | zweiter Samstag in ungeraden Monaten 08:00 |
|
||||
| `restore-immich-quarterly` | `30 8 * * 0` | `m=$(date +%-m); d=$(date +%-d); case "$m" in 2\|5\|8\|11) [ "$d" -ge 8 ] && [ "$d" -le 14 ];; *) false;; esac` | zweiter Sonntag in Feb/Mai/Aug/Nov 08:30 |
|
||||
| `monthly-random-restore` | `0 9 1 * *` | - | erster Kalendertag im Monat 09:00 |
|
||||
|
||||
**Warum so**: ein frueheres Schema wie `0 7 1-7 * 6` haette in Vixie-Cron die OR-Semantik ausgeloest und an jedem Tag 1-7 zusaetzlich zu jedem Samstag gefeuert (~11 Laeufe statt 1 pro Monat). Die obige Trennung Cron-Trigger + Shell-Guard ist die einzige robuste Loesung in Standard-Cron.
|
||||
|
||||
## Betriebsmodus
|
||||
|
||||
- V1:
|
||||
- Bash-Jobs laufen hostseitig manuell oder per User Script
|
||||
- `ntfy` ist optional und folgt nach stabiler Basis
|
||||
- Hermes wertet spaeter nur Reports aus
|
||||
- `ntfy`-Wrapper ist vorhanden; Erfolg geht nach `homelab-info`, Fehler nach `homelab-alerts`
|
||||
- Hermes wertet spaeter optional Reports aus
|
||||
- V2:
|
||||
- fester Host-Schedule
|
||||
- `ntfy` bei Erfolg/Fehler
|
||||
- `ntfy` bei Erfolg/Fehler ueber `run-restore-job-with-ntfy.sh`
|
||||
- Hermes erzeugt Zusammenfassungen und Overviews
|
||||
|
||||
## Automatisierung
|
||||
|
||||
@@ -10,18 +10,22 @@ Host-Repo-Pfad:
|
||||
/mnt/user/services/homelab-infra
|
||||
```
|
||||
|
||||
**Wichtig - Cron-Semantik**: Vixie-Cron verknuepft `day-of-month` und `day-of-week` mit **OR**, sobald beide gesetzt sind. Wir triggern daher an jedem Samstag/Sonntag und filtern den Monatstag per Shell-Guard im User-Script. Siehe `ops/restore-tests/schedule.md`.
|
||||
|
||||
**Wichtig - keine doppelten Schreiber**: die Restore-Skripte schreiben ihren Markdown-Report **selbst** nach `/mnt/user/backups/restore-reports/<service>-YYYY-MM-DD.md`. User-Scripts duerfen den Job-Output **nicht** in dieselbe Datei umleiten, sonst gewinnt der letzte Writer. Wrapper-Output landet stattdessen in `/mnt/user/backups/restore-reports/_wrapper-<mode>-YYYY-MM-DD.log`.
|
||||
|
||||
## Script 1 - `restore-freshness-weekly`
|
||||
|
||||
Zeit:
|
||||
Cron:
|
||||
|
||||
- Montag, 06:30
|
||||
- `30 6 * * 1` (Montag 06:30)
|
||||
|
||||
Inhalt:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-checks.sh freshness \
|
||||
> /mnt/user/backups/restore-reports/freshness-$(date +%F).md
|
||||
exec /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-job-with-ntfy.sh \
|
||||
freshness homelab-info
|
||||
```
|
||||
|
||||
Erwartung:
|
||||
@@ -32,77 +36,110 @@ Erwartung:
|
||||
|
||||
## Script 2 - `restore-vaultwarden-monthly`
|
||||
|
||||
Zeit:
|
||||
Cron:
|
||||
|
||||
- 1. Samstag im Monat, 07:00
|
||||
- `0 7 * * 6` (jeden Samstag 07:00)
|
||||
|
||||
V1-Inhalt:
|
||||
Guard: nur am ersten Samstag im Monat ausfuehren.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-checks.sh vaultwarden \
|
||||
> /mnt/user/backups/restore-reports/vaultwarden-$(date +%F).md
|
||||
# Guard: nur 1.-7. Tag im Monat, damit "1. Samstag" eindeutig getroffen wird.
|
||||
day=$(date +%-d)
|
||||
if [ "$day" -lt 1 ] || [ "$day" -gt 7 ]; then
|
||||
exit 0
|
||||
fi
|
||||
exec /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-job-with-ntfy.sh \
|
||||
vaultwarden homelab-info
|
||||
```
|
||||
|
||||
## Script 3 - `restore-gitea-monthly`
|
||||
|
||||
Zeit:
|
||||
Cron:
|
||||
|
||||
- 3. Samstag im Monat, 07:00
|
||||
- `15 7 * * 6` (jeden Samstag 07:15)
|
||||
|
||||
V1-Inhalt:
|
||||
Guard: nur am dritten Samstag im Monat ausfuehren.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-checks.sh gitea \
|
||||
> /mnt/user/backups/restore-reports/gitea-$(date +%F).md
|
||||
day=$(date +%-d)
|
||||
if [ "$day" -lt 15 ] || [ "$day" -gt 21 ]; then
|
||||
exit 0
|
||||
fi
|
||||
exec /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-job-with-ntfy.sh \
|
||||
gitea homelab-info
|
||||
```
|
||||
|
||||
## Script 4 - `restore-paperless-bimonthly`
|
||||
|
||||
Zeit:
|
||||
Cron:
|
||||
|
||||
- jeder 2. Monat, 2. Samstag, 08:00
|
||||
- `0 8 * * 6` (jeden Samstag 08:00)
|
||||
|
||||
V1-Inhalt:
|
||||
Guard: nur am zweiten Samstag in ungeraden Monaten ausfuehren.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-checks.sh paperless \
|
||||
> /mnt/user/backups/restore-reports/paperless-$(date +%F).md
|
||||
month=$(date +%-m)
|
||||
day=$(date +%-d)
|
||||
case "$month" in
|
||||
1|3|5|7|9|11) ;;
|
||||
*) exit 0 ;;
|
||||
esac
|
||||
if [ "$day" -lt 8 ] || [ "$day" -gt 14 ]; then
|
||||
exit 0
|
||||
fi
|
||||
exec /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-job-with-ntfy.sh \
|
||||
paperless homelab-info
|
||||
```
|
||||
|
||||
## Script 5 - `restore-immich-quarterly`
|
||||
|
||||
Cron:
|
||||
|
||||
- `30 8 * * 0` (jeden Sonntag 08:30)
|
||||
|
||||
Guard: nur am zweiten Sonntag in Feb/Mai/Aug/Nov ausfuehren.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
month=$(date +%-m)
|
||||
day=$(date +%-d)
|
||||
case "$month" in
|
||||
2|5|8|11) ;;
|
||||
*) exit 0 ;;
|
||||
esac
|
||||
if [ "$day" -lt 8 ] || [ "$day" -gt 14 ]; then
|
||||
exit 0
|
||||
fi
|
||||
exec /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-job-with-ntfy.sh \
|
||||
immich homelab-info
|
||||
```
|
||||
|
||||
## Script 6 - `monthly-random-restore`
|
||||
|
||||
Cron:
|
||||
|
||||
- `0 9 1 * *` (erster Kalendertag im Monat 09:00) - kein Guard noetig.
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
exec /mnt/user/services/homelab-infra/ops/restore-tests/monthly-random-restore.sh
|
||||
```
|
||||
|
||||
## Stand
|
||||
|
||||
- die Bash-Jobs wurden am 2026-05-07 hostseitig erfolgreich verifiziert
|
||||
- `freshness`, `vaultwarden`, `gitea` und `paperless` laufen damit prinzipiell automatisch
|
||||
- `ntfy` kann jetzt optional per Wrapper-Skript ergaenzt werden
|
||||
- ntfy-Wrapper schreibt Erfolg/Fehler-Meldungen an die definierten Topics
|
||||
|
||||
## V2 Zielbild
|
||||
## Fehler-Topic
|
||||
|
||||
Als naechster Ausbau kommen dazu:
|
||||
Fehler gehen unabhaengig vom Erfolgstopic nach `homelab-alerts` (siehe `RESTORE_FAILURE_TOPIC` im Wrapper), damit Restore-Probleme auf demselben Handy-Topic landen wie Prometheus-, Docker-, Borg- und Posture-Alarme.
|
||||
|
||||
1. Restore aus Borg
|
||||
2. Testcontainer starten
|
||||
3. Smoke-Test
|
||||
4. Report schreiben
|
||||
5. optional `ntfy`
|
||||
6. Bereinigung
|
||||
|
||||
## Optionales `ntfy` Wrapper-Muster
|
||||
|
||||
Wenn `ntfy` genutzt wird, soll der Host-Job nur Erfolg/Fehler referenzieren, nicht den ganzen Report in die Nachricht kippen.
|
||||
|
||||
Beispiel:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
bash /mnt/user/services/homelab-infra/ops/restore-tests/run-restore-job-with-ntfy.sh freshness homelab-info
|
||||
```
|
||||
|
||||
Fehler gehen unabhaengig vom Erfolgstopic nach `homelab-alerts`, damit Restore-Probleme auf dem gleichen Handy-Topic landen wie Prometheus-, Docker-, Borg- und Posture-Alarme.
|
||||
|
||||
Verwendete Hilfsskripte:
|
||||
## Verwendete Hilfsskripte
|
||||
|
||||
- `ops/restore-tests/send-ntfy.sh`
|
||||
- `ops/restore-tests/run-restore-job-with-ntfy.sh`
|
||||
- `ops/restore-tests/run-restore-checks.sh`
|
||||
|
||||
@@ -10,7 +10,12 @@ services:
|
||||
WEBSOCKET_ENABLED: "true"
|
||||
SIGNUPS_ALLOWED: "false"
|
||||
INVITATIONS_ALLOWED: "false"
|
||||
ADMIN_TOKEN_FILE: /run/secrets/admin_token
|
||||
# Wegwerf-Admin-Token nur fuer den isolierten Smoke-Test.
|
||||
# Bewusst KEIN Mount des produktiven vaultwarden_admin_token.txt,
|
||||
# damit das echte Admin-Token nie in einem Test-Container-Lebenszyklus
|
||||
# auftaucht. Smoke-Test prueft nur Login-Seite, das Token wird nicht
|
||||
# zur Authentifizierung gebraucht.
|
||||
ADMIN_TOKEN: restoretest-vaultwarden-admin-token-placeholder
|
||||
ROCKET_PORT: 80
|
||||
ROCKET_ADDRESS: 0.0.0.0
|
||||
|
||||
@@ -19,7 +24,6 @@ services:
|
||||
|
||||
volumes:
|
||||
- /mnt/user/backups/restore-lab/vaultwarden/data:/data
|
||||
- /mnt/user/appdata/secrets/vaultwarden_admin_token.txt:/run/secrets/admin_token:ro
|
||||
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
|
||||
@@ -8,7 +8,8 @@ Nachweisen, dass ein Vaultwarden-Backup in einer isolierten Testumgebung wieder
|
||||
|
||||
- Backup-Quelle: Borg / Share-Backup
|
||||
- fachlich relevanter Datenpfad: `/mnt/user/appdata/vaultwarden`
|
||||
- Secret: `/mnt/user/appdata/secrets/vaultwarden_admin_token.txt`
|
||||
- Produktives Admin-Token wird fuer den Restore-Smoke bewusst nicht gemountet;
|
||||
die Testinstanz nutzt einen Wegwerf-Wert aus `vaultwarden-compose.test.yml`.
|
||||
|
||||
## Test-Ziel
|
||||
|
||||
@@ -44,7 +45,7 @@ Minimal erfolgreich:
|
||||
|
||||
Optional spaeter:
|
||||
|
||||
- Admin-Endpunkt pruefen
|
||||
- Admin-Endpunkt nur mit separatem Wegwerf-Token pruefen
|
||||
- Websocket-Endpunkt pruefen
|
||||
- Anzahl/Vorhandensein zentraler Daten artefaktisch verifizieren
|
||||
|
||||
|
||||
@@ -37,8 +37,14 @@ require_cmd curl
|
||||
require_path "$BORG_PASSPHRASE_FILE_DEFAULT"
|
||||
require_path "$COMPOSE_FILE"
|
||||
|
||||
RESTORE_SUCCESS=0
|
||||
cleanup() {
|
||||
cleanup_compose "$COMPOSE_FILE"
|
||||
if [ "$RESTORE_SUCCESS" -ne 1 ]; then
|
||||
preserve_on_failure "vaultwarden" "$RESTORE_ROOT"
|
||||
rm -rf "$EXTRACT_DIR"
|
||||
return
|
||||
fi
|
||||
if [ "$KEEP_DATA" -ne 1 ]; then
|
||||
rm -rf "$DATA_DIR"
|
||||
fi
|
||||
@@ -82,4 +88,5 @@ write_report "$REPORT_FILE" <<EOF
|
||||
- Test data was cleaned after success: \`$([ "$KEEP_DATA" -eq 1 ] && echo no || echo yes)\`
|
||||
EOF
|
||||
|
||||
RESTORE_SUCCESS=1
|
||||
echo "Vaultwarden restore test ok -> $REPORT_FILE"
|
||||
|
||||
@@ -3,9 +3,9 @@
|
||||
## Vorbedingungen
|
||||
|
||||
- Borg-Quelle ist verfuegbar
|
||||
- Secret-Datei vorhanden: `/mnt/user/appdata/secrets/vaultwarden_admin_token.txt`
|
||||
- Borg-Passphrase-Datei vorhanden: `/mnt/user/appdata/secrets/borg_repo_passphrase.txt`
|
||||
- Testpfade unter `/mnt/user/backups/restore-lab/` und `/mnt/user/backups/restore-reports/` sind freigegeben
|
||||
- **Hinweis**: das produktive `vaultwarden_admin_token.txt` wird im Testcontainer **nicht** mehr gemountet. Die Testinstanz nutzt einen Wegwerf-Token; der Smoke-Test prueft nur die Login-Seite, kein Admin-Endpunkt.
|
||||
|
||||
## Bestaetigter Host-Stand
|
||||
|
||||
@@ -76,7 +76,7 @@ Zielpfad nach dem Restore:
|
||||
3. Testcontainer starten
|
||||
|
||||
```bash
|
||||
docker compose -f /mnt/user/services/homelab/ops/restore-tests/vaultwarden-compose.test.yml up -d
|
||||
docker compose -f /mnt/user/services/homelab-infra/ops/restore-tests/vaultwarden-compose.test.yml up -d
|
||||
```
|
||||
|
||||
4. Smoke-Test
|
||||
@@ -95,7 +95,7 @@ Minimal erfolgreich:
|
||||
5. Testcontainer wieder stoppen
|
||||
|
||||
```bash
|
||||
docker compose -f /mnt/user/services/homelab/ops/restore-tests/vaultwarden-compose.test.yml down
|
||||
docker compose -f /mnt/user/services/homelab-infra/ops/restore-tests/vaultwarden-compose.test.yml down
|
||||
```
|
||||
|
||||
6. Report schreiben
|
||||
|
||||
Reference in New Issue
Block a user