Harden backup and posture checks
This commit is contained in:
@@ -280,7 +280,7 @@ Legende Status:
|
||||
| `komodo` | ✅ | `frontend_net` | Traefik, native Auth | primaerer GitOps-Stack-Manager | bewusste Ausnahme: keine pauschale ForwardAuth-Middleware vor UI/API/Webhooks/Periphery |
|
||||
| `code-server` | ✅ | `frontend_net` | Traefik + Middleware | `PASSWORD_FILE` aktiv | — |
|
||||
| `PortainerCE` | ❌ entfernt | - | - | 2026-03-29 abgeschaltet | historisch; nicht mehr deployen |
|
||||
| `filebrowser` | ✅ | `frontend_net` | Traefik + Middleware | aktiv via `files.kaleschke.info` | Mounts einschränken (Block F) |
|
||||
| `filebrowser` | ✅ | `frontend_net` | Traefik + Middleware | aktiv via `files.kaleschke.info` | Appdata-Breitmount entfernt; nur Documents/Photos/Projekte plus eigener App-State |
|
||||
| `borg-ui` | ✅ | `frontend_net` | Traefik + Middleware | produktiver Borg-/Restore-Dienst; `/local/secrets` ist bewusst Teil des Restore-Scopes | BorgBase-Repo und Key laufend pflegen |
|
||||
| `paperless-gpt` | ✅ | `frontend_net` | Traefik + Middleware | aktiv via `paperless-gpt.kaleschke.info` | — |
|
||||
| `bentopdf` | ✅ vorbereitet | `frontend_net` | Traefik + Middleware | PDF-Tooling via `pdf.kaleschke.info`; browserseitige Verarbeitung, COOP/COEP fuer Office-Konvertierung | Deploy und fachliche Abnahme offen |
|
||||
@@ -534,9 +534,9 @@ Host-Pfade in `env_file` (z.B. `/mnt/...`) sind in Git-Stacks nicht verfügbar.
|
||||
### Reproduzierbare Deployments (2026-04-17)
|
||||
Mutable Tags wie `latest`, `stable`, `release` oder reine Major-Tags wurden auf die **aktuell laufenden Digests** eingefroren. Das ist bewusst **kein Upgrade-Mechanismus**, sondern dient dazu, den heute funktionierenden Laufzeitstand exakt im Repo festzuhalten. Echte Versions-Upgrades bleiben ein eigener, geplanter Schritt.
|
||||
|
||||
### Stateful Digest-Pinning (2026-05-05)
|
||||
### Stateful Digest-Pinning (2026-05-05, ergaenzt 2026-05-16)
|
||||
- Tier-1/stateful Basisdienste werden bevorzugt mit sprechendem Minor-/Patch-Tag plus Digest gepinnt, z. B. `postgres:17.9@sha256:...` oder `mongo:7.0.32@sha256:...`.
|
||||
- Redis-Caches bleiben bewusst ohne Digest-Pin. Cache-Verlust ist akzeptabel, und Sicherheits-Patches sollen dort ohne eigenen Datenbank-Upgrade-Sprint fliessen koennen.
|
||||
- Redis-Caches sind seit dem Hardening-Sprint 2026-05-16 auf `redis:7.4-alpine@sha256:...` vereinheitlicht. Updates erfolgen bewusst stackweise mit Smoke-Test.
|
||||
- Bereits versionierte Apps koennen optional spaeter ebenfalls Digests erhalten; dieser Schritt ist getrennt vom Datenhalter-Pinning.
|
||||
|
||||
### Nextcloud und Stirling-PDF (2026-04-19)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
services:
|
||||
homepage:
|
||||
image: ghcr.io/gethomepage/homepage:latest@sha256:cc84f2f5eb3c7734353701ccbaa24ed02dacb0d119114e50e4251e2005f3990a
|
||||
image: ghcr.io/gethomepage/homepage:v1.12.3@sha256:cc84f2f5eb3c7734353701ccbaa24ed02dacb0d119114e50e4251e2005f3990a
|
||||
container_name: homepage
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
|
||||
@@ -43,7 +43,7 @@ services:
|
||||
|
||||
redis:
|
||||
container_name: immich_redis
|
||||
image: redis:7
|
||||
image: redis:7.4-alpine@sha256:6ab0b6e7381779332f97b8ca76193e45b0756f38d4c0dcda72dbb3c32061ab99
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- immich_default
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
services:
|
||||
nextcloud:
|
||||
image: nextcloud:33.0.2-apache
|
||||
image: nextcloud:33.0.2-apache@sha256:39b2ba219271a22851f8409a7b1295d5892aba1696d9193500311c02e60591a4
|
||||
container_name: nextcloud
|
||||
restart: unless-stopped
|
||||
depends_on:
|
||||
@@ -64,7 +64,7 @@ services:
|
||||
- no-new-privileges:true
|
||||
|
||||
nextcloud-redis:
|
||||
image: redis:7.4-alpine
|
||||
image: redis:7.4-alpine@sha256:6ab0b6e7381779332f97b8ca76193e45b0756f38d4c0dcda72dbb3c32061ab99
|
||||
container_name: nextcloud-redis
|
||||
restart: unless-stopped
|
||||
command: redis-server --save 60 1 --loglevel warning
|
||||
|
||||
+10
-1
@@ -72,6 +72,15 @@ Dieses Dokument ist nur noch ein historischer Verlauf. Der aktuelle operative Ab
|
||||
- Leere `env/domains.env.example` und `env/global.env.example` mit nicht geheimen Beispielwerten gefuellt.
|
||||
- Veraltete `.keep`-Platzhalter aus Verzeichnissen mit echten Compose-/Repo-Inhalten sowie zwei reine Geister-Verzeichnisse (`host-services/plex`, `infra/dns`) entfernt.
|
||||
|
||||
### 2026-05-16 - Backup-Konsistenz und erster Hardening-Schnitt
|
||||
|
||||
- SQLite-Dumps fuer Gitea, Vaultwarden, Uptime Kuma, Speedtest Tracker und Filebrowser werden containerseitig als `*.sqlite.dump` erzeugt und per Freshness-Check geprueft.
|
||||
- `nextcloud.dump` und die Nextcloud-Userdaten sind als Option A im Borg-Scope dokumentiert.
|
||||
- Filebrowser mountet keine breite `/mnt/user/appdata`-Flaeche mehr, sondern nur noch Documents, Photos, Projekte sowie eigenen App-State.
|
||||
- Authelia Argon2id-Parameter in der Repo-Baseline auf `iterations: 3`, `memory: 65536`, `parallelism: 4` gesetzt; produktive Host-Config muss kontrolliert gemerged und mit Test-User validiert werden.
|
||||
- Redis-Caches wurden auf `redis:7.4-alpine@sha256:...` vereinheitlicht; Nextcloud wurde mit Registry-validiertem Digest gepinnt.
|
||||
- Eindeutig aufloesbare `latest@sha256`-Images wurden auf konkrete Tags umgestellt: Homepage `v1.12.3`, code-server `4.116.0`, Filebrowser `v2.63.2`, Speedtest Tracker `1.13.12`.
|
||||
|
||||
### 2026-05-05 - M3b versionierte App-Images digest-gepinnt
|
||||
|
||||
- Versionierte Nicht-Komodo-Images fuer BentoPDF, Mealie, Paperless, Paperless-GPT, AdGuard Home, Grafana, InfluxDB 3 Core und Traefik auf die am Host laufenden, manifest-validierten Digests gepinnt.
|
||||
@@ -90,7 +99,7 @@ Dieses Dokument ist nur noch ein historischer Verlauf. Der aktuelle operative Ab
|
||||
- PostgreSQL 17 Datenhalter auf `postgres:17.9@sha256:5b96f1a16bd9768b060dd2ffe55cb6225c4d9ef4d214a8b21eb08134869a97e4` gepinnt (`postgresql17`, `mealie-postgres`, `nextcloud-postgres`).
|
||||
- Immich pgvector-Postgres auf `tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:739cdd626151ff1f796dc95a6591b55a714f341c737e27f045019ceabf8e8c52` gepinnt.
|
||||
- Komodo Mongo auf `mongo:7.0.32@sha256:32979a1189dfdc44da3f5ed40d910495f5ad8f6f7f77556646f890a30b2d3f56` sowie Komodo Core/Periphery und Gitea auf die am Host laufenden Digests gepinnt.
|
||||
- Redis-Caches bleiben bewusst ohne Digest-Pin; Redeploys erfolgen stackweise mit Smoke-Test, nicht parallel.
|
||||
- Redis-Caches wurden am 2026-05-16 auf `redis:7.4-alpine@sha256:...` vereinheitlicht; Redeploys erfolgen stackweise mit Smoke-Test, nicht parallel.
|
||||
|
||||
### 2026-05-04 - Komodo Self-Stack Drift auf persistenten Pfad zurueckgefuehrt
|
||||
|
||||
|
||||
+12
-8
@@ -17,6 +17,7 @@ Secret-Werte werden hier nicht dokumentiert. Aufgefuehrt werden nur Variablennam
|
||||
| `host-services/` | Host-nahe Dienste mit direkten Ports oder Host-Netz |
|
||||
| `infra/` | technische Infrastruktur wie PostgreSQL, Redis, DDNS |
|
||||
| `ops/` | Operations-, Backup-, Monitoring- und Admin-Tools |
|
||||
| `services/` | Host-seitige Betriebsskripte und Recovery-kritische Service-Hilfen |
|
||||
| `security/` | Identity/Security-Dienste wie Authelia und Vaultwarden |
|
||||
| `traefik/` | Reverse Proxy Compose und dynamic File-Provider-Konfiguration |
|
||||
|
||||
@@ -45,7 +46,9 @@ Secret-Werte werden hier nicht dokumentiert. Aufgefuehrt werden nur Variablennam
|
||||
| `traefik/dynamic/tls.yml` | leer; File-Provider-Platzhalter |
|
||||
| `security/authelia/configuration.yml` | versionierte Authelia-Baseline fuer nicht geheime ACL-/Session-/Storage-Einstellungen; manuelle Host-Merge-Pflicht, User-Daten, OIDC-Client-Konfiguration und Secret-Werte bleiben ausserhalb von Git |
|
||||
| `ops/grafana-influxdb/provisioning/datasources/influxdb.yml` | Grafana Datasource Provisioning fuer InfluxDB 3 Core |
|
||||
| `ops/borg-ui/scripts/pre-backup-dumps.sh` | Host-seitiges Dump-Skript fuer PostgreSQL, Mealie, Immich und Komodo Mongo |
|
||||
| `ops/borg-ui/scripts/pre-backup-dumps.sh` | Host-seitiges Dump-Skript fuer PostgreSQL, SQLite-Container-Dumps und Komodo Mongo |
|
||||
| `services/posture-check/posture-check.sh` | Host-seitiger Posture-Check fuer Filesystem, Mover-Drift, NVMe-SMART, Fuellstand und ntfy-Alarmierung |
|
||||
| `services/posture-check/posture_check.sh` | Kompatibilitaets-Wrapper fuer die Schreibweise aus `STORAGE_LAYOUT.draft.md` |
|
||||
| `ops/hermes-agent/config/hermes/config.yaml` | Hermes Agent Konfiguration mit Env-Platzhaltern |
|
||||
| `ops/hermes-agent/hermes.env.example` | Beispiel fuer Hermes `.env`; echte Datei liegt auf Host-Appdata |
|
||||
| `ops/hermes-agent/stack.env.example` | Beispiel fuer Hermes Stack-ENV; echte `stack.env` bleibt host-/komodoseitig und ist per `.gitignore` ausgeschlossen |
|
||||
@@ -59,7 +62,7 @@ Secret-Werte werden hier nicht dokumentiert. Aufgefuehrt werden nur Variablennam
|
||||
| Stack | Compose | Services / Images | Traefik Hosts | Networks | Ports | Abhaengigkeiten |
|
||||
|---|---|---|---|---|---|---|
|
||||
| BentoPDF | `apps/bentopdf/docker-compose.yml` | `bentopdf` -> `bentopdfteam/bentopdf:2.8.4` | `pdf.kaleschke.info` | `frontend_net` | keine | Traefik + Authelia; COOP/COEP Middleware |
|
||||
| Homepage | `apps/homepage/docker-compose.yml` | `homepage` -> `ghcr.io/gethomepage/homepage:latest@sha256:...` | `home.kaleschke.info` | `frontend_net` | keine | Docker-Socket read-only fuer Widgets; viele `HOMEPAGE_VAR_*` Env-Keys |
|
||||
| Homepage | `apps/homepage/docker-compose.yml` | `homepage` -> `ghcr.io/gethomepage/homepage:v1.12.3@sha256:...` | `home.kaleschke.info` | `frontend_net` | keine | Docker-Socket read-only fuer Widgets; viele `HOMEPAGE_VAR_*` Env-Keys |
|
||||
| Immich | `apps/immich/docker-compose.yml` | `immich-server`, `immich-machine-learning`, `database`, `redis` | `immich.kaleschke.info` | `frontend_net`, `immich_default` | keine | `immich-server` depends on `database`, `redis` |
|
||||
| Mail Archiver | `apps/mail-archiver/docker-compose.yml` | `mail-archiver` -> `s1t5/mailarchiver@sha256:...` | `mail.kaleschke.info` | `frontend_net`, `backend_net` | keine | shared PostgreSQL via env connection string; Internet fuer IMAP |
|
||||
| Mealie | `apps/mealie/docker-compose.yml` | `mealie`, `mealie-postgres` | `mealie.kaleschke.info` | `frontend_net`, `mealie_internal` | keine | eigene PostgreSQL im internen Netz |
|
||||
@@ -78,7 +81,7 @@ Secret-Werte werden hier nicht dokumentiert. Aufgefuehrt werden nur Variablennam
|
||||
| Vaultwarden | `security/vaultwarden/docker-compose.yml` | `vaultwarden` -> `vaultwarden/server:latest@sha256:...` | `vault.kaleschke.info` | `frontend_net` | keine | Datei-Persistenz, `ADMIN_TOKEN_FILE` |
|
||||
| ddns-updater | `infra/ddns-updater/docker-compose.yml` | `ddns-updater` -> `ghcr.io/qdm12/ddns-updater:latest@sha256:...` | keine | `frontend_net` | keine | Cloudflare/API-Internetbedarf |
|
||||
| PostgreSQL 17 | `infra/postgresql17/docker-compose.yml` | `postgresql17` -> `postgres:17.9@sha256:...` | keine | `backend_net` | keine | shared DB-Cluster |
|
||||
| Redis | `infra/redis/docker-compose.yml` | `Redis` -> `redis:7-alpine` | keine | `backend_net` | keine | shared Cache, Passwort-Datei |
|
||||
| Redis | `infra/redis/docker-compose.yml` | `Redis` -> `redis:7.4-alpine@sha256:...` | keine | `backend_net` | keine | shared Cache, Passwort-Datei |
|
||||
|
||||
### Host Services
|
||||
|
||||
@@ -92,14 +95,14 @@ Secret-Werte werden hier nicht dokumentiert. Aufgefuehrt werden nur Variablennam
|
||||
| Stack | Compose | Services / Images | Traefik Hosts | Networks | Ports | Abhaengigkeiten |
|
||||
|---|---|---|---|---|---|---|
|
||||
| Borg UI | `ops/borg-ui/docker-compose.yml` | `borg-ui` -> `ainullcode/borg-ui:latest@sha256:...` | `borg.kaleschke.info` | `frontend_net` | keine | Borg repo, Dump-Scope, Restore-Ziel |
|
||||
| code-server | `ops/code-server/docker-compose.yml` | `code-server` -> `lscr.io/linuxserver/code-server:latest@sha256:...` | `code.kaleschke.info` | `frontend_net` | keine | Passwort-Datei, Workspace-Mounts |
|
||||
| Filebrowser | `ops/filebrowser/docker-compose.yml` | `filebrowser` -> `filebrowser/filebrowser:latest@sha256:...` | `files.kaleschke.info` | `frontend_net` | keine | Appdata-Mount, Admin-UI hinter Authelia |
|
||||
| code-server | `ops/code-server/docker-compose.yml` | `code-server` -> `lscr.io/linuxserver/code-server:4.116.0@sha256:...` | `code.kaleschke.info` | `frontend_net` | keine | Passwort-Datei, Workspace-Mounts |
|
||||
| Filebrowser | `ops/filebrowser/docker-compose.yml` | `filebrowser` -> `filebrowser/filebrowser:v2.63.2@sha256:...` | `files.kaleschke.info` | `frontend_net` | keine | Documents/Photos/Projekte-Mounts, Admin-UI hinter Authelia |
|
||||
| Glances | `ops/glances/docker-compose.yml` | `glances` -> `nicolargo/glances:latest-full@sha256:...` | `glances.kaleschke.info` | `frontend_net` | keine | Rootfs/Docker-Socket fuer Monitoring |
|
||||
| Grafana/InfluxDB | `ops/grafana-influxdb/docker-compose.yml` | `grafana`, `influxdb3-core` | `grafana.kaleschke.info` | `frontend_net`, `grafana_influx_internal`, `grafana_influx_lan` | `influxdb3-core`: `${INFLUXDB_BIND_IP:-127.0.0.1}:8181:8181` | InfluxDB LAN-only fuer Home Assistant; Grafana datasource token; beide Container laufen aktuell als `user: "0"` |
|
||||
| Hermes Agent | `ops/hermes-agent/docker-compose.yml` | `hermes-gateway`, `hermes-dashboard` -> local build from Dockerfile | `hermes.kaleschke.info` via `${HERMES_DASHBOARD_HOST}` | `hermes_net`, dashboard zusaetzlich `frontend_net` | `8642` nur expose intern | SSH runner, Home Assistant optional, LLM provider env; Dashboard hinter Authelia |
|
||||
| Komodo | `ops/komodo/docker-compose.yml` | `komodo-core`, `komodo-mongo`, `komodo-periphery` | `komodo.kaleschke.info` | `frontend_net`, `komodo_net` | keine | Mongo, Docker socket, `/mnt/user/services` workspace mount, Gitea DNS override |
|
||||
| Scrutiny | `ops/scrutiny/docker-compose.yml` | `scrutiny` -> `ghcr.io/starosdev/scrutiny:latest-omnibus@sha256:...` | `scrutiny.kaleschke.info` | `frontend_net` | keine | `privileged: true`, device mounts fuer SMART |
|
||||
| Speedtest Tracker | `ops/speedtest/docker-compose.yml` | `speedtest-tracker` -> `lscr.io/linuxserver/speedtest-tracker:latest@sha256:...` | `speedtest.kaleschke.info` | `frontend_net` | keine | App key/admin env, SQLite/config path |
|
||||
| Speedtest Tracker | `ops/speedtest/docker-compose.yml` | `speedtest-tracker` -> `lscr.io/linuxserver/speedtest-tracker:1.13.12@sha256:...` | `speedtest.kaleschke.info` | `frontend_net` | keine | App key/admin env, SQLite/config path |
|
||||
| Uptime Kuma | `ops/uptime-kuma/docker-compose.yml` | `UptimeKuma` -> `louislam/uptime-kuma:1@sha256:...` | `uptime.kaleschke.info` | `frontend_net` | keine | Monitor-State in Appdata |
|
||||
|
||||
### Traefik
|
||||
@@ -174,7 +177,7 @@ Secret-Werte werden hier nicht dokumentiert. Aufgefuehrt werden nur Variablennam
|
||||
| Tailscale | `/mnt/user/appdata/tailscale` |
|
||||
| Borg UI | `/mnt/user/appdata/borg-ui/data`, `/mnt/user/appdata/borg-ui/cache`, `/mnt/user/backups/borg/dumps`, selected restore/source mounts |
|
||||
| code-server | `/mnt/user/appdata/code-server`, `/mnt/user/services/dev`, Homepage production mount |
|
||||
| Filebrowser | `/mnt/user/appdata`, Filebrowser database/config paths |
|
||||
| Filebrowser | `/mnt/user/documents`, `/mnt/user/photos`, `/mnt/user/projekte`, Filebrowser database/config paths |
|
||||
| Glances | `/`, Docker socket, `/etc/os-release` |
|
||||
| Scrutiny | `/mnt/user/appdata/scrutiny/*`, `/run/udev`, selected `/dev/...` disks |
|
||||
| Speedtest | `/mnt/user/appdata/speedtest-tracker/config` |
|
||||
@@ -208,6 +211,7 @@ Secret-Werte werden hier nicht dokumentiert. Aufgefuehrt werden nur Variablennam
|
||||
| Skript | Ausfuehrungsort | Zweck |
|
||||
|---|---|---|
|
||||
| `ops/borg-ui/scripts/pre-backup-dumps.sh` | Unraid Host, nicht Borg-UI Inline-Hook | erzeugt aktuelle Dumps unter `/mnt/user/backups/borg/dumps/latest` |
|
||||
| `services/posture-check/posture-check.sh` | Unraid Host | schreibt `/mnt/user/services/posture-check/last.json` und alarmiert via ntfy bei Warning/Critical |
|
||||
|
||||
Das Skript liest Secret-Dateien auf dem Host und schreibt Dump-Artefakte. Bei Analyse niemals Secret-Inhalte ausgeben.
|
||||
|
||||
@@ -218,7 +222,7 @@ Das Skript liest Secret-Dateien auf dem Host und schreibt Dump-Artefakte. Bei An
|
||||
- `apps/mealie` nutzt in Compose `mealie_internal`; Architektur-Doku nennt teils `mealie_mealie_internal`. Laufzeitnamen koennen durch Compose-Projektpraefixe abweichen.
|
||||
- `backend_net` ist in der Architektur als `internal: true` beschrieben; einzelne Compose-Dateien referenzieren es external. Live-Netz-Attribute bei Drift-Fragen pruefen.
|
||||
- Einige Images bleiben trotz Digest-Pin semantisch auf mutable Tags (`latest@sha256`, `release@sha256`). Das ist bewusst dokumentiert, aber bei Updates gesondert pruefen.
|
||||
- Stateful Datenhalter sind seit 2026-05-05 bevorzugt mit Minor-/Patch-Tag plus Digest gepinnt; Redis-Caches bleiben bewusst ohne Digest-Pin.
|
||||
- Stateful Datenhalter sind seit 2026-05-05 bevorzugt mit Minor-/Patch-Tag plus Digest gepinnt; Redis-Caches wurden im Hardening-Sprint 2026-05-16 auf `redis:7.4-alpine@sha256:...` vereinheitlicht.
|
||||
- `scrutiny` bleibt `privileged: true`; dokumentierte Ausnahme, aber weiterhin pruefenswert.
|
||||
- `tailscale` nutzt Host-Netz, `NET_ADMIN`, `NET_RAW` und `/dev/net/tun` als dokumentierte VPN-Ausnahme.
|
||||
- `grafana` und `influxdb3-core` laufen aktuell als `user: "0"`; UID/GID-Hardening nur als eigener Sprint.
|
||||
|
||||
+10
-9
@@ -32,9 +32,9 @@ Sie ist die fachliche Ergaenzung zu `docs/DISASTER_RECOVERY.md`.
|
||||
| PostgreSQL 17 | Share + Dumps | `/mnt/user/appdata/postgresql17` | `postgresql17-globals.sql`, `postgresql17-mailarchiver.dump`, `postgresql17-paperless.dump`, optional `postgresql17-authelia.dump` | `postgres_password.txt` | `backend_net` | DB startet, Ziel-Datenbanken vorhanden |
|
||||
| Redis | Share / Host | `/mnt/user/appdata/redis` | keine | `redis_password.txt` | `backend_net` | Redis startet, Apps verbinden sich |
|
||||
| Authelia | Borg | `/mnt/user/appdata/authelia/config`, `/mnt/user/appdata/secrets/*authelia*` | Shared PostgreSQL, optional Dump `postgresql17-authelia.dump` | JWT/Session/Storage/Postgres-/SMTP-Secret-Dateien | PostgreSQL 17, Traefik, GMX SMTP | Login-Seite und ForwardAuth funktionieren; SMTP-Notifier startet; aktive Sessions werden nach Restart neu aufgebaut |
|
||||
| Gitea | Borg + Dump | `/mnt/user/services/gitea/data` | `gitea.sqlite` | `borg_repo_passphrase.txt` fuer Restore-Tests | Traefik | Web-UI erreichbar, Repo sichtbar, SSH-Port reagiert; Mini-Restore nach `/mnt/user/backups/restore-lab/gitea` am 2026-05-07 erfolgreich validiert |
|
||||
| Gitea | Borg + Dump | `/mnt/user/services/gitea/data` | `gitea.sqlite.dump` | `borg_repo_passphrase.txt` fuer Restore-Tests | Traefik | Web-UI erreichbar, Repo sichtbar, SSH-Port reagiert; Mini-Restore nach `/mnt/user/backups/restore-lab/gitea` am 2026-05-07 erfolgreich validiert |
|
||||
| Komodo | Borg / Share | `/mnt/user/appdata/komodo/core`, `/mnt/user/appdata/komodo/periphery` | `komodo-mongo.archive.gz` falls verifiziert | `komodo_mongo_password.txt`, `KOMODO_*` Stack ENV | Traefik, Mongo, Gitea | UI erreichbar, Periphery verbunden |
|
||||
| Vaultwarden | Borg + Dump | `/mnt/user/appdata/vaultwarden` | `vaultwarden.sqlite` | `vaultwarden_admin_token.txt`, `borg_repo_passphrase.txt` fuer Restore-Tests | Traefik | Login-Seite erreichbar, Tresor-Daten sichtbar; Mini-Restore nach `/mnt/user/backups/restore-lab/vaultwarden` am 2026-05-07 erfolgreich validiert |
|
||||
| Vaultwarden | Borg + Dump | `/mnt/user/appdata/vaultwarden` | `vaultwarden.sqlite.dump` | `vaultwarden_admin_token.txt`, `borg_repo_passphrase.txt` fuer Restore-Tests | Traefik | Login-Seite erreichbar, Tresor-Daten sichtbar; Mini-Restore nach `/mnt/user/backups/restore-lab/vaultwarden` am 2026-05-07 erfolgreich validiert |
|
||||
|
||||
---
|
||||
|
||||
@@ -58,11 +58,11 @@ Sie ist die fachliche Ergaenzung zu `docs/DISASTER_RECOVERY.md`.
|
||||
| Dienst | Fuehrende Quelle | Datei-Restore | Dump / DB | Secrets / ENV | Abhaengigkeiten | Smoke-Test |
|
||||
|---|---|---|---|---|---|---|
|
||||
| Borg UI | Borg + Dump | `/mnt/user/appdata/borg-ui/data` | `borg-ui.sqlite` | Borg-Repo-Creds in `/data` | Traefik | UI startet, Repo-Verbindung bekannt |
|
||||
| Uptime Kuma | Share / Fresh + Dump | `/mnt/user/appdata/uptime-kuma` | `uptime-kuma.sqlite` | `uptime_kuma_admin_password.txt` bei Fresh-Rebuild | Traefik, Authelia | UI startet, Admin-User vorhanden, Monitore ggf. neu anlegen |
|
||||
| Filebrowser | Share / Fresh | `/mnt/user/appdata/filebrowser` | BoltDB-App-State im App-Pfad | `filebrowser_admin_password.txt` bei Fresh-Rebuild | Traefik, Authelia | UI startet, Admin-User vorhanden |
|
||||
| Uptime Kuma | Share / Fresh + Dump | `/mnt/user/appdata/uptime-kuma` | `uptime-kuma.sqlite.dump` | `uptime_kuma_admin_password.txt` bei Fresh-Rebuild | Traefik, Authelia | UI startet, Admin-User vorhanden, Monitore ggf. neu anlegen |
|
||||
| Filebrowser | Share / Fresh + Dump | `/mnt/user/appdata/filebrowser` | `filebrowser.sqlite.dump` | `filebrowser_admin_password.txt` bei Fresh-Rebuild | Traefik, Authelia | UI startet, Admin-User vorhanden |
|
||||
| Glances | Rebuildbar | kein kritischer Zustand | keine | keine | Traefik, Authelia | UI startet |
|
||||
| Scrutiny | Teilweise rebuildbar | `/mnt/user/appdata/scrutiny` falls gewuenscht | InfluxDB bewusst nicht Teil des Critical-Scope | keine | Traefik, Authelia | UI startet, Laufwerke sichtbar |
|
||||
| Speedtest Tracker | Share + Dump | `/mnt/user/appdata/speedtest-tracker/config` | `speedtest-tracker.sqlite` | `APP_KEY`, `ADMIN_PASSWORD` | Traefik, Authelia | UI startet |
|
||||
| Speedtest Tracker | Share + Dump | `/mnt/user/appdata/speedtest-tracker/config` | `speedtest-tracker.sqlite.dump` | `APP_KEY`, `ADMIN_PASSWORD` | Traefik, Authelia | UI startet |
|
||||
| BentoPDF | Rebuildbar | keine kritische Persistenz; alte Stirling-PDF-Daten unter `/mnt/user/appdata/stirling-pdf` bis zur Abnahme behalten | keine | keine separaten Secret-Dateien dokumentiert | Traefik, Authelia | UI startet, PDF-Tools verfuegbar, Office-Konvertierung ueber HTTPS funktioniert |
|
||||
| Grafana | Share + Dump | `/mnt/user/appdata/grafana`, inklusive `provisioning/datasources/influxdb.yml` | `grafana.sqlite` | `grafana_admin_password.txt`, `grafana_influxdb_token.txt` | Traefik, Authelia, InfluxDB 3 Core | UI startet, InfluxDB-Datenquelle testet erfolgreich |
|
||||
| InfluxDB 3 Core | Share | `/mnt/user/appdata/influxdb3/data`, `/mnt/user/appdata/influxdb3/plugins` | dateibasierter Object Store | `influxdb3_admin_token.json` | internes `grafana_influx_internal` Netz | `homelab`-Datenbank vorhanden, Grafana kann SQL-Abfrage ausfuehren |
|
||||
@@ -82,10 +82,11 @@ Aktuell relevante Dump-Artefakte unter `/mnt/user/backups/borg/dumps/latest`:
|
||||
- `mealie.dump`
|
||||
- `immich.dump`
|
||||
- `nextcloud.dump`
|
||||
- `gitea.sqlite`
|
||||
- `vaultwarden.sqlite`
|
||||
- `uptime-kuma.sqlite`
|
||||
- `speedtest-tracker.sqlite`
|
||||
- `gitea.sqlite.dump`
|
||||
- `vaultwarden.sqlite.dump`
|
||||
- `uptime-kuma.sqlite.dump`
|
||||
- `speedtest-tracker.sqlite.dump`
|
||||
- `filebrowser.sqlite.dump`
|
||||
- `borg-ui.sqlite`
|
||||
- `grafana.sqlite`
|
||||
- `komodo-mongo.archive.gz` (noch gesondert verifizieren)
|
||||
|
||||
+11
-5
@@ -14,14 +14,14 @@ Secret-Werte sind nicht enthalten. Es werden nur Secret-Namen, Env-Key-Namen und
|
||||
| `adguard` | DNS-Server / LAN DNS | `host-services/Adguard/docker-compose.yml` | LAN-Port `53`, Admin `8082` | `dns_net`, `frontend_net`, Unbound | `/mnt/user/appdata/adguard/conf`, `/mnt/user/appdata/adguard/work` | Tier 1, config relevant | nein | Direkte Ports 53 und 8082 dokumentierte Ausnahme; Admin-Port spaeter ggf. absichern |
|
||||
| `unbound` | Upstream DNS Resolver fuer AdGuard | `apps/unbound/docker-compose.yml` | intern | `dns_net` | `/mnt/user/appdata/unbound/config` | rebuildbar / config relevant | nein | intern isoliert |
|
||||
| `tailscale` | VPN/Remote-Zugang | `host-services/tailscale/docker-compose.yml` | Tailscale | Host-Netz | `/mnt/user/appdata/tailscale` | Tier 1, State relevant | nein | `network_mode: host`, `NET_ADMIN`, `NET_RAW` und `/dev/net/tun` sind dokumentierte VPN-Ausnahmen |
|
||||
| `gitea` | Git-Server / origin fuer GitOps | `core/gitea/docker-compose.yml` | `https://git.kaleschke.info`, SSH `222` | Traefik, `frontend_net` | `/mnt/user/services/gitea/data` | Tier 1, `gitea.sqlite` + Share | ja | SSH-Port 222 direkte Host-Port-Ausnahme; ohne externen Mirror im DR kritisch |
|
||||
| `gitea` | Git-Server / origin fuer GitOps | `core/gitea/docker-compose.yml` | `https://git.kaleschke.info`, SSH `222` | Traefik, `frontend_net` | `/mnt/user/services/gitea/data` | Tier 1, `gitea.sqlite.dump` + Share | ja | SSH-Port 222 direkte Host-Port-Ausnahme; ohne externen Mirror im DR kritisch |
|
||||
|
||||
## Security / Identity
|
||||
|
||||
| Service | Zweck | Autoritativer Pfad | URL / Zugang | Abhaengigkeiten | Datenpfade | Backup / Restore | Traefik | Besonderheiten / TODOs |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| `authelia` | ForwardAuth / zentrale Auth fuer Admin-UIs | `security/authelia/docker-compose.yml`, `security/authelia/configuration.yml` | `https://auth.kaleschke.info` | PostgreSQL 17, Traefik, GMX SMTP | `/mnt/user/appdata/authelia/config`, Authelia Secret-Dateien | Tier 1, config + DB + secrets | ja | Bewusst ohne Redis-Session-Backend; SMTP-Notifier via GMX und `authelia_smtp_password.txt`; explizite DNS-Server fuer SMTP/NTP; Repo-Baseline muss manuell in die Host-Config gemerged werden, OIDC/Secrets bleiben hostseitig; Access-Control und Compose-Middleware bei Aenderungen abgleichen |
|
||||
| `vaultwarden` | Passwort-Tresor | `security/vaultwarden/docker-compose.yml` | `https://vault.kaleschke.info` | Traefik, `frontend_net` | `/mnt/user/appdata/vaultwarden` | Tier 1, `vaultwarden.sqlite` + Share | ja | `ADMIN_TOKEN_FILE`; keine direkten Ports |
|
||||
| `vaultwarden` | Passwort-Tresor | `security/vaultwarden/docker-compose.yml` | `https://vault.kaleschke.info` | Traefik, `frontend_net` | `/mnt/user/appdata/vaultwarden` | Tier 1, `vaultwarden.sqlite.dump` + Share | ja | `ADMIN_TOKEN_FILE`; keine direkten Ports |
|
||||
|
||||
## Shared Infrastructure
|
||||
|
||||
@@ -59,17 +59,23 @@ Secret-Werte sind nicht enthalten. Es werden nur Secret-Namen, Env-Key-Namen und
|
||||
| `komodo-mongo` | Komodo Datenbank | `ops/komodo/docker-compose.yml` | intern | `komodo_net` | `/mnt/user/appdata/komodo/mongo`, `komodo_mongo_password.txt` | Tier 1, `komodo-mongo.archive.gz` | nein | Dump am 2026-05-04 bestaetigt; nach Major-Upgrades pruefen |
|
||||
| `komodo-periphery` | Komodo Host-Agent | `ops/komodo/docker-compose.yml` | intern Core -> Periphery | Docker socket, `/mnt/user/services`, `frontend_net`, `komodo_net` | `/mnt/user/appdata/komodo/periphery`, `komodo_keys` | Tier 1 | nein | Docker-Socket-Ausnahme; `/mnt/user/services` Mount fuer Stack-Workspaces |
|
||||
| `borg-ui` | Borg Backup-/Restore UI | `ops/borg-ui/docker-compose.yml` | `https://borg.kaleschke.info` | Traefik + Authelia, Borg repo credentials | `/mnt/user/appdata/borg-ui/data`, `/mnt/user/backups/borg/dumps`, Restore-Ziel | Tier 3 / Backup kritisch, `borg-ui.sqlite` | ja + Authelia | breite Mounts bewusst; `/local/secrets` im DR-Scope; Nextcloud-Daten werden read-only nach `/local/nextcloud/data` eingebunden |
|
||||
| `uptime-kuma` | Monitoring / Uptime Checks | `ops/uptime-kuma/docker-compose.yml` | `https://uptime.kaleschke.info` | Traefik + Authelia | `/mnt/user/appdata/uptime-kuma` | Tier 3, `uptime-kuma.sqlite` | ja + Authelia | Monitore nach Restore pruefen |
|
||||
| `uptime-kuma` | Monitoring / Uptime Checks | `ops/uptime-kuma/docker-compose.yml` | `https://uptime.kaleschke.info` | Traefik + Authelia | `/mnt/user/appdata/uptime-kuma` | Tier 3, `uptime-kuma.sqlite.dump` | ja + Authelia | Monitore nach Restore pruefen |
|
||||
| `glances` | System-/Container-Monitoring | `ops/glances/docker-compose.yml` | `https://glances.kaleschke.info` | Docker socket, rootfs, Traefik + Authelia | kein kritischer Zustand | Tier 3, rebuildbar | ja + Authelia | Dokumentierte Host-Observability-Ausnahme: `pid: host`, `/:/rootfs:ro`, `/var/run/docker.sock:/var/run/docker.sock:ro`, `/etc/os-release:/etc/os-release:ro`; keine Appdaten ausserhalb `/mnt/user/...` |
|
||||
| `scrutiny` | Laufwerks-/SMART-Monitoring | `ops/scrutiny/docker-compose.yml` | `https://scrutiny.kaleschke.info` | Device mounts, Traefik + Authelia | `/mnt/user/appdata/scrutiny/config`, `/mnt/user/appdata/scrutiny/influxdb` | Tier 3, Metrics nicht kritisch | ja + Authelia | Dokumentierte Host-Observability-Ausnahme: `privileged: true`, `/run/udev:/run/udev:ro`, `/dev/sdb:/dev/sdb`, `/dev/sdc:/dev/sdc`, `/dev/nvme0n1:/dev/nvme0n1`; keine Appdaten ausserhalb `/mnt/user/...` |
|
||||
| `speedtest-tracker` | Speedtest-Monitoring | `ops/speedtest/docker-compose.yml` | `https://speedtest.kaleschke.info` | Traefik + Authelia | `/mnt/user/appdata/speedtest-tracker/config` | Tier 3, `speedtest-tracker.sqlite` | ja + Authelia | `APP_KEY`, `ADMIN_PASSWORD` Stack ENV |
|
||||
| `filebrowser` | Datei-Browser fuer Appdata | `ops/filebrowser/docker-compose.yml` | `https://files.kaleschke.info` | Traefik + Authelia | `/mnt/user/appdata/filebrowser/*`, breiter `/mnt/user/appdata` Mount | Tier 3, BoltDB-App-State im App-Pfad | ja + Authelia | Mounts langfristig einschraenken |
|
||||
| `speedtest-tracker` | Speedtest-Monitoring | `ops/speedtest/docker-compose.yml` | `https://speedtest.kaleschke.info` | Traefik + Authelia | `/mnt/user/appdata/speedtest-tracker/config` | Tier 3, `speedtest-tracker.sqlite.dump` | ja + Authelia | `APP_KEY`, `ADMIN_PASSWORD` Stack ENV |
|
||||
| `filebrowser` | Datei-Browser fuer Documents/Photos/Projekte | `ops/filebrowser/docker-compose.yml` | `https://files.kaleschke.info` | Traefik + Authelia | `/mnt/user/appdata/filebrowser/*`, `/mnt/user/documents`, `/mnt/user/photos`, `/mnt/user/projekte` | Tier 3, `filebrowser.sqlite.dump` + Share | ja + Authelia | Breiter Appdata-Mount entfernt; Secrets und Traefik-Dynamic-Config sind nicht mehr ueber Filebrowser gemountet |
|
||||
| `code-server` | Web-Editor / Operations Workspace | `ops/code-server/docker-compose.yml` | `https://code.kaleschke.info` | Traefik + Authelia | `/mnt/user/appdata/code-server`, `/mnt/user/services/dev` | Tier 3 | ja + Authelia | Passwort ueber LSIO `FILE__PASSWORD`; Workspaces beachten |
|
||||
| `grafana` | Metrik-Dashboard | `ops/grafana-influxdb/docker-compose.yml` | `https://grafana.kaleschke.info` | Traefik + Authelia, InfluxDB 3 Core | `/mnt/user/appdata/grafana`, Grafana provisioning | Tier 3, `grafana.sqlite` | ja + Authelia | Datasource wird provisioniert, Token ueber Secret; laeuft aktuell als `user: "0"` wegen Host-Appdata-Permissions |
|
||||
| `influxdb3-core` | Zeitreihen-/Metrikdaten fuer Grafana und Home Assistant | `ops/grafana-influxdb/docker-compose.yml` | LAN `8181` je `INFLUXDB_BIND_IP`, keine Public URL | Grafana, Home Assistant Writer | `/mnt/user/appdata/influxdb3/data`, `/mnt/user/appdata/influxdb3/plugins` | Tier 3 | nein | LAN-only Host-Port-Ausnahme; `401 Unauthorized` beim Curl ohne Token ist erwarteter Reachability-Test; laeuft aktuell als `user: "0"` wegen Host-Appdata-Permissions |
|
||||
| `hermes-gateway` | Hermes Agent Gateway/API intern | `ops/hermes-agent/docker-compose.yml` | intern `8642` auf `hermes_net` | SSH Runner (VM 192.168.178.143), LLM Provider, optional Home Assistant | `/mnt/user/appdata/hermes-agent/data`, SSH key path | Tier 3, Borg/Share | nein | NAS-Stack bleibt deaktiviert, solange die separate Hermes-VM/Runner-Seite nicht wiederhergestellt ist; kein Docker-Socket |
|
||||
| `hermes-dashboard` | Hermes Dashboard | `ops/hermes-agent/docker-compose.yml` | `https://hermes.kaleschke.info` via `${HERMES_DASHBOARD_HOST}` | `hermes-gateway`, Traefik + Authelia | shared read-only data mount | Tier 3, Borg/Share | ja + Authelia | Compose-Profil `dashboard`; aktuell VM-seitig offen, nicht Teil des NAS-Finalstarts |
|
||||
|
||||
## Host Operations
|
||||
|
||||
| Service | Zweck | Autoritativer Pfad | URL / Zugang | Abhaengigkeiten | Datenpfade | Backup / Restore | Traefik | Besonderheiten / TODOs |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| `posture-check` | Host-Posture-Audit fuer Filesystem, Mover-Drift, NVMe-SMART und Fuellstand | `services/posture-check/posture-check.sh` | Unraid User-Script / Cron / Borg Pre-Hook | `findmnt`, `df`, `nvme`, optional `curl` fuer ntfy | `/mnt/user/services/posture-check/last.json` | Repo-Skript + letzter JSON-Status | nein | Muss auf dem Unraid-Host bei Boot, stuendlich und vor Borg laufen; Warning/Critical alarmieren via ntfy |
|
||||
|
||||
## Backup- und Restore-Hinweise
|
||||
|
||||
- Tier-1-Dienste stehen in `docs/RESTORE_MATRIX.md` und `docs/DISASTER_RECOVERY.md`.
|
||||
|
||||
@@ -272,7 +272,7 @@ Automatisierter Check, der mindestens täglich läuft (z. B. via User-Script ode
|
||||
| CIFS/NFS-Mount-Liveness (für jeden konfigurierten Remote-Mount) | `stat` mit 10s-Timeout erfolgreich | Sofortalarm |
|
||||
| Anzahl laufender Container | matcht Soll aus Repo | Warnung |
|
||||
|
||||
**Implementierung:** Skript unter `services/posture-check/posture_check.sh`, ausgegeben als JSON, konsumiert von Notifier (Slack, ntfy, E-Mail — Operator-Entscheidung).
|
||||
**Implementierung:** Skript unter `services/posture-check/posture-check.sh` plus Kompatibilitaets-Wrapper `services/posture-check/posture_check.sh`, ausgegeben als JSON nach `/mnt/user/services/posture-check/last.json`, konsumiert von ntfy.
|
||||
|
||||
## 12. Hard Rules — Constitution
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
services:
|
||||
redis:
|
||||
image: redis:7-alpine
|
||||
image: redis:7.4-alpine@sha256:6ab0b6e7381779332f97b8ca76193e45b0756f38d4c0dcda72dbb3c32061ab99
|
||||
container_name: Redis
|
||||
restart: unless-stopped
|
||||
command:
|
||||
|
||||
@@ -43,6 +43,7 @@ The inclusion of `/local/secrets` is intentional: Borg is expected to cover disa
|
||||
| Komodo | config + Mongo dump | `/local/borg-dumps`, `/local/appdata/komodo/periphery`, `/local/appdata/komodo/core` |
|
||||
| Nextcloud | DB dump + file data | `/local/borg-dumps`, `/local/appdata/nextcloud/html`, `/local/nextcloud/data` |
|
||||
| Grafana | SQLite dump + file data | `/local/borg-dumps`, `/local/appdata/grafana` |
|
||||
| Filebrowser | SQLite dump + file data | `/local/borg-dumps`, `/local/appdata/filebrowser` |
|
||||
| InfluxDB 3 Core | file data | `/local/appdata/influxdb3/data`, `/local/appdata/influxdb3/plugins` |
|
||||
| Hermes Agent | file data + SSH key | `/local/appdata/hermes-agent/data`, `/local/secrets/hermes_runner_id_ed25519` |
|
||||
| BentoPDF | rebuildable | no critical persistence in compose |
|
||||
@@ -53,7 +54,7 @@ These are deviations from the standard "DB dump first, file path second" strateg
|
||||
|
||||
### Nextcloud
|
||||
|
||||
`pre-backup-dumps.sh` writes `nextcloud.dump` from `nextcloud-postgres`. Borg UI also mounts `/mnt/user/documents/nextcloud-data` read-only as `/local/nextcloud/data`, so database and user files are both inside scope after the Borg UI stack is recreated.
|
||||
Option A umgesetzt: `pre-backup-dumps.sh` writes `nextcloud.dump` from `nextcloud-postgres`. Borg UI also mounts `/mnt/user/documents/nextcloud-data` read-only as `/local/nextcloud/data`, so database and user files are both inside scope after the Borg UI stack is recreated.
|
||||
|
||||
### Komodo Mongo dump
|
||||
|
||||
@@ -76,7 +77,7 @@ These are deviations from the standard "DB dump first, file path second" strateg
|
||||
### Other Databases
|
||||
|
||||
- Komodo MongoDB
|
||||
- SQLite: `gitea`, `vaultwarden`, `uptime-kuma`, `speedtest-tracker`, `borg-ui`, `grafana`
|
||||
- SQLite: `gitea`, `vaultwarden`, `uptime-kuma`, `speedtest-tracker`, `filebrowser`, `borg-ui`, `grafana`
|
||||
|
||||
## Explicitly Not Backed Up as Raw Live DB Files
|
||||
|
||||
@@ -98,7 +99,6 @@ These are not part of the first-class Borg scope:
|
||||
- uptime-kuma
|
||||
- scrutiny metrics history
|
||||
- dozzle, glances, speedtest
|
||||
- filebrowser app state
|
||||
|
||||
## Suggested Retention
|
||||
|
||||
|
||||
Regular → Executable
+41
-6
@@ -94,6 +94,39 @@ dump_sqlite_file() {
|
||||
atomic_write "$output" "$tmp"
|
||||
}
|
||||
|
||||
dump_sqlite_container() {
|
||||
container="$1"
|
||||
db_path="$2"
|
||||
output="$3"
|
||||
|
||||
if ! need_container "$container"; then
|
||||
warn "Skipping missing container: $container"
|
||||
return 0
|
||||
fi
|
||||
|
||||
container_tmp="/tmp/$(basename "$output").bak"
|
||||
tmp="$TMP_DIR/$(basename "$output").tmp"
|
||||
|
||||
log "Dumping SQLite database '$db_path' from $container"
|
||||
rm -f "$tmp"
|
||||
docker exec "$container" rm -f "$container_tmp" >/dev/null 2>&1 || true
|
||||
if ! docker exec "$container" sqlite3 "$db_path" ".backup $container_tmp"; then
|
||||
warn "SQLite backup failed for $container:$db_path"
|
||||
docker exec "$container" rm -f "$container_tmp" >/dev/null 2>&1 || true
|
||||
rm -f "$tmp"
|
||||
return 1
|
||||
fi
|
||||
docker cp "$container:$container_tmp" "$tmp"
|
||||
docker exec "$container" rm -f "$container_tmp" >/dev/null 2>&1 || true
|
||||
|
||||
if [ "$(sqlite3 "$tmp" 'PRAGMA quick_check;')" != "ok" ]; then
|
||||
warn "SQLite quick_check failed for $container:$db_path"
|
||||
rm -f "$tmp"
|
||||
return 1
|
||||
fi
|
||||
atomic_write "$output" "$tmp"
|
||||
}
|
||||
|
||||
dump_optional_pg_db() {
|
||||
container="$1"
|
||||
password="$2"
|
||||
@@ -196,12 +229,14 @@ main() {
|
||||
warn "Skipping missing container: nextcloud-postgres"
|
||||
fi
|
||||
|
||||
# SQLite databases. Use host-side sqlite3 so the dump does not depend on
|
||||
# utility packages inside application images.
|
||||
dump_sqlite_file "/mnt/user/services/gitea/data/gitea/gitea.db" "$LATEST_DIR/gitea.sqlite" "gitea"
|
||||
dump_sqlite_file "/mnt/user/appdata/vaultwarden/db.sqlite3" "$LATEST_DIR/vaultwarden.sqlite" "vaultwarden"
|
||||
dump_sqlite_file "/mnt/user/appdata/uptime-kuma/kuma.db" "$LATEST_DIR/uptime-kuma.sqlite" "uptime-kuma"
|
||||
dump_sqlite_file "/mnt/user/appdata/speedtest-tracker/config/database.sqlite" "$LATEST_DIR/speedtest-tracker.sqlite" "speedtest-tracker"
|
||||
# SQLite databases
|
||||
dump_sqlite_container "gitea" "/data/gitea/gitea.db" "$LATEST_DIR/gitea.sqlite.dump"
|
||||
dump_sqlite_container "vaultwarden" "/data/db.sqlite3" "$LATEST_DIR/vaultwarden.sqlite.dump"
|
||||
dump_sqlite_container "uptime-kuma" "/app/data/kuma.db" "$LATEST_DIR/uptime-kuma.sqlite.dump"
|
||||
dump_sqlite_container "speedtest-tracker" "/config/database.sqlite" "$LATEST_DIR/speedtest-tracker.sqlite.dump"
|
||||
dump_sqlite_container "filebrowser" "/database/filebrowser.db" "$LATEST_DIR/filebrowser.sqlite.dump"
|
||||
|
||||
# Additional host-side SQLite dumps for admin tooling with appdata files.
|
||||
dump_sqlite_file "/mnt/user/appdata/borg-ui/data/borg.db" "$LATEST_DIR/borg-ui.sqlite" "borg-ui"
|
||||
dump_sqlite_file "/mnt/user/appdata/grafana/grafana.db" "$LATEST_DIR/grafana.sqlite" "grafana"
|
||||
|
||||
|
||||
Executable
+38
@@ -0,0 +1,38 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="${REPO_ROOT:-$(cd "$SCRIPT_DIR/../../.." && pwd)}"
|
||||
POSTURE_CHECK="${POSTURE_CHECK:-$REPO_ROOT/services/posture-check/posture-check.sh}"
|
||||
FRESHNESS_CHECK="${FRESHNESS_CHECK:-$REPO_ROOT/ops/restore-tests/check-restore-freshness.sh}"
|
||||
PRE_BACKUP_DUMPS="${PRE_BACKUP_DUMPS:-$SCRIPT_DIR/pre-backup-dumps.sh}"
|
||||
NTFY_SCRIPT="${NTFY_SCRIPT:-$REPO_ROOT/ops/restore-tests/send-ntfy.sh}"
|
||||
NTFY_TOPIC="${NTFY_TOPIC:-kallilab-critical}"
|
||||
|
||||
notify_failure() {
|
||||
local step="$1"
|
||||
local message="$2"
|
||||
if [ -x "$NTFY_SCRIPT" ]; then
|
||||
"$NTFY_SCRIPT" "$NTFY_TOPIC" "Borg pre-hook failed: $step" "$message" high || true
|
||||
fi
|
||||
}
|
||||
|
||||
run_step() {
|
||||
local step="$1"
|
||||
shift
|
||||
|
||||
echo "[pre-borg] Running $step"
|
||||
if "$@"; then
|
||||
echo "[pre-borg] OK: $step"
|
||||
else
|
||||
rc=$?
|
||||
notify_failure "$step" "Command failed with exit code $rc: $*"
|
||||
exit "$rc"
|
||||
fi
|
||||
}
|
||||
|
||||
run_step "posture-check" "$POSTURE_CHECK"
|
||||
run_step "pre-backup-dumps" "$PRE_BACKUP_DUMPS"
|
||||
run_step "restore-freshness" "$FRESHNESS_CHECK"
|
||||
|
||||
echo "[pre-borg] All pre-flight checks passed"
|
||||
@@ -1,6 +1,6 @@
|
||||
services:
|
||||
code-server:
|
||||
image: lscr.io/linuxserver/code-server:latest@sha256:4620adace18935dd6ca79d77e3bc1c379e21875392192f970cf5d6b0fb4aefcd
|
||||
image: lscr.io/linuxserver/code-server:4.116.0@sha256:4620adace18935dd6ca79d77e3bc1c379e21875392192f970cf5d6b0fb4aefcd
|
||||
container_name: code-server
|
||||
restart: unless-stopped
|
||||
security_opt:
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
services:
|
||||
filebrowser:
|
||||
image: filebrowser/filebrowser:latest@sha256:4dce87308b9f9cfbcf8d0a284fc9565d2b515530a6bae2d920b388161e093f26
|
||||
image: filebrowser/filebrowser:v2.63.2@sha256:4dce87308b9f9cfbcf8d0a284fc9565d2b515530a6bae2d920b388161e093f26
|
||||
container_name: filebrowser
|
||||
restart: unless-stopped
|
||||
security_opt:
|
||||
@@ -9,7 +9,9 @@ services:
|
||||
- PUID=99
|
||||
- PGID=100
|
||||
volumes:
|
||||
- /mnt/user/appdata:/srv/appdata
|
||||
- /mnt/user/documents:/srv/documents
|
||||
- /mnt/user/photos:/srv/photos
|
||||
- /mnt/user/projekte:/srv/projekte
|
||||
- /mnt/user/appdata/filebrowser/database:/database
|
||||
- /mnt/user/appdata/filebrowser/config:/config
|
||||
command: ["--database", "/database/filebrowser.db"]
|
||||
|
||||
@@ -252,6 +252,10 @@ function Test-ServicePolicies {
|
||||
}
|
||||
}
|
||||
|
||||
if ($service.Image -match ':[Ll]atest(?:[-@]|$)') {
|
||||
Add-Finding -Findings $Findings -Severity 'warning' -Code 'IMAGE001' -Target $targetBase -Message 'Image uses a latest tag. Prefer a concrete version tag, even when a digest is present.'
|
||||
}
|
||||
|
||||
$isDataService = $false
|
||||
$identityText = ($service.ServiceName + ' ' + $service.ContainerName + ' ' + $service.Image).ToLowerInvariant()
|
||||
foreach ($needle in @('postgres', 'redis', 'mongo', 'mysql', 'mariadb', 'influxdb')) {
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
param(
|
||||
[string]$DumpRoot = "/mnt/user/backups/borg/dumps/latest",
|
||||
[string]$ReportRoot = "/mnt/user/backups/restore-reports",
|
||||
[int]$MaxDumpAgeHours = 36,
|
||||
[int]$MaxDumpAgeHours = 26,
|
||||
[int]$MaxReportAgeDays = 45
|
||||
)
|
||||
|
||||
@@ -9,7 +9,13 @@ $checks = @(
|
||||
@{ Name = "postgresql17-paperless.dump"; Path = Join-Path $DumpRoot "postgresql17-paperless.dump" },
|
||||
@{ Name = "postgresql17-mailarchiver.dump"; Path = Join-Path $DumpRoot "postgresql17-mailarchiver.dump" },
|
||||
@{ Name = "mealie.dump"; Path = Join-Path $DumpRoot "mealie.dump" },
|
||||
@{ Name = "immich.dump"; Path = Join-Path $DumpRoot "immich.dump" }
|
||||
@{ Name = "immich.dump"; Path = Join-Path $DumpRoot "immich.dump" },
|
||||
@{ Name = "nextcloud.dump"; Path = Join-Path $DumpRoot "nextcloud.dump" },
|
||||
@{ Name = "gitea.sqlite.dump"; Path = Join-Path $DumpRoot "gitea.sqlite.dump" },
|
||||
@{ Name = "vaultwarden.sqlite.dump"; Path = Join-Path $DumpRoot "vaultwarden.sqlite.dump" },
|
||||
@{ Name = "uptime-kuma.sqlite.dump"; Path = Join-Path $DumpRoot "uptime-kuma.sqlite.dump" },
|
||||
@{ Name = "speedtest-tracker.sqlite.dump"; Path = Join-Path $DumpRoot "speedtest-tracker.sqlite.dump" },
|
||||
@{ Name = "filebrowser.sqlite.dump"; Path = Join-Path $DumpRoot "filebrowser.sqlite.dump" }
|
||||
)
|
||||
|
||||
$reportChecks = @(
|
||||
@@ -30,15 +36,25 @@ foreach ($check in $checks) {
|
||||
}
|
||||
|
||||
$item = Get-Item $check.Path
|
||||
if ($item.Length -le 0) {
|
||||
$critical.Add("DUMP_EMPTY $($check.Name)")
|
||||
continue
|
||||
}
|
||||
|
||||
$ageHours = ($now - $item.LastWriteTime).TotalHours
|
||||
if ($ageHours -gt $MaxDumpAgeHours) {
|
||||
$warnings.Add(("DUMP_STALE {0} age={1:N1}h" -f $check.Name, $ageHours))
|
||||
$critical.Add(("DUMP_STALE {0} age={1:N1}h" -f $check.Name, $ageHours))
|
||||
} else {
|
||||
$info.Add(("DUMP_OK {0} age={1:N1}h" -f $check.Name, $ageHours))
|
||||
}
|
||||
}
|
||||
|
||||
foreach ($check in $reportChecks) {
|
||||
if (-not (Test-Path $ReportRoot)) {
|
||||
$warnings.Add("REPORT_ROOT_MISSING $ReportRoot")
|
||||
break
|
||||
}
|
||||
|
||||
$latest = Get-ChildItem -Path $ReportRoot -Filter ([System.IO.Path]::GetFileName($check.Path)) -ErrorAction SilentlyContinue |
|
||||
Sort-Object LastWriteTime -Descending |
|
||||
Select-Object -First 1
|
||||
|
||||
Regular → Executable
+22
-3
@@ -3,7 +3,7 @@ set -euo pipefail
|
||||
|
||||
DUMP_ROOT="${DUMP_ROOT:-/mnt/user/backups/borg/dumps/latest}"
|
||||
REPORT_ROOT="${REPORT_ROOT:-/mnt/user/backups/restore-reports}"
|
||||
MAX_DUMP_AGE_HOURS="${MAX_DUMP_AGE_HOURS:-36}"
|
||||
MAX_DUMP_AGE_HOURS="${MAX_DUMP_AGE_HOURS:-26}"
|
||||
MAX_REPORT_AGE_DAYS="${MAX_REPORT_AGE_DAYS:-45}"
|
||||
|
||||
now_epoch="$(date +%s)"
|
||||
@@ -25,21 +25,40 @@ check_file_age_days() {
|
||||
echo $(( (now_epoch - mtime) / 86400 ))
|
||||
}
|
||||
|
||||
for dump in postgresql17-paperless.dump postgresql17-mailarchiver.dump mealie.dump immich.dump; do
|
||||
for dump in \
|
||||
postgresql17-paperless.dump \
|
||||
postgresql17-mailarchiver.dump \
|
||||
mealie.dump \
|
||||
immich.dump \
|
||||
nextcloud.dump \
|
||||
gitea.sqlite.dump \
|
||||
vaultwarden.sqlite.dump \
|
||||
uptime-kuma.sqlite.dump \
|
||||
speedtest-tracker.sqlite.dump \
|
||||
filebrowser.sqlite.dump; do
|
||||
path="$DUMP_ROOT/$dump"
|
||||
if [ ! -f "$path" ]; then
|
||||
critical+=("DUMP_MISSING $dump")
|
||||
continue
|
||||
fi
|
||||
if [ ! -s "$path" ]; then
|
||||
critical+=("DUMP_EMPTY $dump")
|
||||
continue
|
||||
fi
|
||||
age="$(check_file_age_hours "$path")"
|
||||
if [ "$age" -gt "$MAX_DUMP_AGE_HOURS" ]; then
|
||||
warnings+=("DUMP_STALE $dump age=${age}h")
|
||||
critical+=("DUMP_STALE $dump age=${age}h")
|
||||
else
|
||||
info+=("DUMP_OK $dump age=${age}h")
|
||||
fi
|
||||
done
|
||||
|
||||
for service in vaultwarden gitea paperless; do
|
||||
if [ ! -d "$REPORT_ROOT" ]; then
|
||||
warnings+=("REPORT_ROOT_MISSING $REPORT_ROOT")
|
||||
break
|
||||
fi
|
||||
|
||||
latest="$(find "$REPORT_ROOT" -maxdepth 1 -type f -name "$service-*.md" | sort | tail -n 1 || true)"
|
||||
if [ -z "$latest" ]; then
|
||||
warnings+=("REPORT_MISSING $service")
|
||||
|
||||
Executable
+15
@@ -0,0 +1,15 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
TOPIC="${TOPIC:-kallilab-info}"
|
||||
TESTS="${TESTS:-vaultwarden gitea paperless}"
|
||||
|
||||
pick_random() {
|
||||
printf '%s\n' $TESTS | awk 'BEGIN { srand() } { items[++count] = $0 } END { print items[int(rand() * count) + 1] }'
|
||||
}
|
||||
|
||||
selected="$(pick_random)"
|
||||
echo "Selected monthly restore test: $selected"
|
||||
|
||||
exec "$SCRIPT_DIR/run-restore-job-with-ntfy.sh" "$selected" "$TOPIC"
|
||||
@@ -1,6 +1,6 @@
|
||||
services:
|
||||
speedtest-tracker:
|
||||
image: lscr.io/linuxserver/speedtest-tracker:latest@sha256:eb3d249f16177964daa4fff7f6a90bbf6645f4e23158d92f5cddb133728d0804
|
||||
image: lscr.io/linuxserver/speedtest-tracker:1.13.12@sha256:eb3d249f16177964daa4fff7f6a90bbf6645f4e23158d92f5cddb133728d0804
|
||||
container_name: speedtest-tracker
|
||||
restart: unless-stopped
|
||||
security_opt:
|
||||
|
||||
@@ -18,11 +18,11 @@ authentication_backend:
|
||||
path: /config/users_database.yml
|
||||
password:
|
||||
algorithm: argon2id
|
||||
iterations: 1
|
||||
iterations: 3
|
||||
key_length: 32
|
||||
salt_length: 16
|
||||
memory: 1024
|
||||
parallelism: 8
|
||||
memory: 65536
|
||||
parallelism: 4
|
||||
|
||||
access_control:
|
||||
default_policy: deny
|
||||
|
||||
Executable
+144
@@ -0,0 +1,144 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
OUTPUT_PATH="${OUTPUT_PATH:-/mnt/user/services/posture-check/cert-token-last.json}"
|
||||
NTFY_BASE_URL="${NTFY_BASE_URL:-https://ntfy.sh}"
|
||||
WARNING_TOPIC="${WARNING_TOPIC:-kallilab-warning}"
|
||||
CRITICAL_TOPIC="${CRITICAL_TOPIC:-kallilab-critical}"
|
||||
SEND_NTFY="${SEND_NTFY:-1}"
|
||||
CLOUDFLARE_TOKEN_FILE="${CLOUDFLARE_TOKEN_FILE:-/mnt/user/appdata/traefik/secrets/cloudflare_dns_api_token}"
|
||||
WARN_DAYS="${WARN_DAYS:-14}"
|
||||
CRITICAL_DAYS="${CRITICAL_DAYS:-7}"
|
||||
DOMAINS="${DOMAINS:-traefik.kaleschke.info auth.kaleschke.info vault.kaleschke.info git.kaleschke.info cloud.kaleschke.info home.kaleschke.info borg.kaleschke.info grafana.kaleschke.info}"
|
||||
TMP_DIR="${TMP_DIR:-/tmp/kallilab-cert-token-check}"
|
||||
|
||||
mkdir -p "$TMP_DIR"
|
||||
RESULTS_FILE="$TMP_DIR/results.$$"
|
||||
: > "$RESULTS_FILE"
|
||||
trap 'rm -f "$RESULTS_FILE"' EXIT
|
||||
|
||||
json_escape() {
|
||||
sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e 's/\t/\\t/g'
|
||||
}
|
||||
|
||||
add_result() {
|
||||
printf '%s\t%s\t%s\n' "$1" "$2" "$3" >> "$RESULTS_FILE"
|
||||
}
|
||||
|
||||
check_cert() {
|
||||
local domain="$1"
|
||||
local enddate
|
||||
local end_epoch
|
||||
local now_epoch
|
||||
local days_left
|
||||
|
||||
if ! enddate="$(printf '' | openssl s_client -servername "$domain" -connect "$domain:443" 2>/dev/null | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2-)"; then
|
||||
add_result "critical" "cert_$domain" "Cannot read certificate for $domain"
|
||||
return
|
||||
fi
|
||||
|
||||
end_epoch="$(date -d "$enddate" +%s)"
|
||||
now_epoch="$(date +%s)"
|
||||
days_left="$(( (end_epoch - now_epoch) / 86400 ))"
|
||||
|
||||
if [ "$days_left" -lt "$CRITICAL_DAYS" ]; then
|
||||
add_result "critical" "cert_$domain" "$domain certificate expires in ${days_left}d"
|
||||
elif [ "$days_left" -lt "$WARN_DAYS" ]; then
|
||||
add_result "warning" "cert_$domain" "$domain certificate expires in ${days_left}d"
|
||||
else
|
||||
add_result "ok" "cert_$domain" "$domain certificate expires in ${days_left}d"
|
||||
fi
|
||||
}
|
||||
|
||||
check_cloudflare_token() {
|
||||
local token
|
||||
local response
|
||||
|
||||
if [ ! -s "$CLOUDFLARE_TOKEN_FILE" ]; then
|
||||
add_result "critical" "cloudflare_token" "Token file missing or empty: $CLOUDFLARE_TOKEN_FILE"
|
||||
return
|
||||
fi
|
||||
|
||||
token="$(cat "$CLOUDFLARE_TOKEN_FILE")"
|
||||
if ! response="$(curl -fsS -H "Authorization: Bearer $token" https://api.cloudflare.com/client/v4/user/tokens/verify 2>/dev/null)"; then
|
||||
add_result "critical" "cloudflare_token" "Cloudflare token verify request failed"
|
||||
return
|
||||
fi
|
||||
|
||||
if printf '%s' "$response" | grep -q '"success"[[:space:]]*:[[:space:]]*true'; then
|
||||
add_result "ok" "cloudflare_token" "Cloudflare token verify succeeded"
|
||||
else
|
||||
add_result "critical" "cloudflare_token" "Cloudflare token verify returned non-success"
|
||||
fi
|
||||
}
|
||||
|
||||
send_ntfy() {
|
||||
local severity="$1"
|
||||
local topic="$2"
|
||||
local body="$3"
|
||||
|
||||
if [ "$SEND_NTFY" != "1" ] || ! command -v curl >/dev/null 2>&1; then
|
||||
return
|
||||
fi
|
||||
|
||||
printf '%s\n' "$body" | curl -fsS \
|
||||
-H "Title: KalliLab cert-token-check $severity" \
|
||||
-H "Priority: high" \
|
||||
--data-binary @- \
|
||||
"$NTFY_BASE_URL/$topic" >/dev/null || true
|
||||
}
|
||||
|
||||
write_json() {
|
||||
local timestamp
|
||||
local critical_count
|
||||
local warning_count
|
||||
local status
|
||||
local first=1
|
||||
|
||||
timestamp="$(date -Iseconds)"
|
||||
critical_count="$(awk -F '\t' '$1 == "critical" { count++ } END { print count + 0 }' "$RESULTS_FILE")"
|
||||
warning_count="$(awk -F '\t' '$1 == "warning" { count++ } END { print count + 0 }' "$RESULTS_FILE")"
|
||||
|
||||
if [ "$critical_count" -gt 0 ]; then
|
||||
status="critical"
|
||||
elif [ "$warning_count" -gt 0 ]; then
|
||||
status="warning"
|
||||
else
|
||||
status="ok"
|
||||
fi
|
||||
|
||||
mkdir -p "$(dirname "$OUTPUT_PATH")"
|
||||
{
|
||||
printf '{\n'
|
||||
printf ' "timestamp": "%s",\n' "$(printf '%s' "$timestamp" | json_escape)"
|
||||
printf ' "status": "%s",\n' "$status"
|
||||
printf ' "critical_count": %s,\n' "$critical_count"
|
||||
printf ' "warning_count": %s,\n' "$warning_count"
|
||||
printf ' "checks": [\n'
|
||||
while IFS="$(printf '\t')" read -r severity name message; do
|
||||
if [ "$first" -eq 0 ]; then printf ',\n'; fi
|
||||
first=0
|
||||
printf ' {"severity":"%s","name":"%s","message":"%s"}' \
|
||||
"$(printf '%s' "$severity" | json_escape)" \
|
||||
"$(printf '%s' "$name" | json_escape)" \
|
||||
"$(printf '%s' "$message" | json_escape)"
|
||||
done < "$RESULTS_FILE"
|
||||
printf '\n ]\n}\n'
|
||||
} > "$OUTPUT_PATH.tmp"
|
||||
mv "$OUTPUT_PATH.tmp" "$OUTPUT_PATH"
|
||||
cat "$OUTPUT_PATH"
|
||||
|
||||
if [ "$status" = "critical" ]; then
|
||||
send_ntfy critical "$CRITICAL_TOPIC" "Certificate/token check critical: $critical_count critical, $warning_count warning. See $OUTPUT_PATH"
|
||||
return 2
|
||||
elif [ "$status" = "warning" ]; then
|
||||
send_ntfy warning "$WARNING_TOPIC" "Certificate/token check warning: $warning_count warning. See $OUTPUT_PATH"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
for domain in $DOMAINS; do
|
||||
check_cert "$domain"
|
||||
done
|
||||
check_cloudflare_token
|
||||
write_json
|
||||
+96
@@ -0,0 +1,96 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
REPO_ROOT="${REPO_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)}"
|
||||
OUTPUT_PATH="${OUTPUT_PATH:-/mnt/user/services/posture-check/compose-runtime-drift-last.json}"
|
||||
NTFY_SCRIPT="${NTFY_SCRIPT:-$REPO_ROOT/ops/restore-tests/send-ntfy.sh}"
|
||||
NTFY_TOPIC="${NTFY_TOPIC:-kallilab-warning}"
|
||||
SEND_NTFY="${SEND_NTFY:-1}"
|
||||
TMP_DIR="${TMP_DIR:-/tmp/kallilab-compose-runtime-drift}"
|
||||
|
||||
mkdir -p "$TMP_DIR"
|
||||
RESULTS_FILE="$TMP_DIR/results.$$"
|
||||
: > "$RESULTS_FILE"
|
||||
trap 'rm -f "$RESULTS_FILE"' EXIT
|
||||
|
||||
json_escape() {
|
||||
sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e 's/\t/\\t/g'
|
||||
}
|
||||
|
||||
add_result() {
|
||||
printf '%s\t%s\t%s\n' "$1" "$2" "$3" >> "$RESULTS_FILE"
|
||||
}
|
||||
|
||||
parse_compose() {
|
||||
local compose="$1"
|
||||
awk '
|
||||
/^ [A-Za-z0-9_.-]+:/ {
|
||||
service=$1
|
||||
sub(/:$/, "", service)
|
||||
image=""
|
||||
container=service
|
||||
}
|
||||
service && /^ image:/ {
|
||||
image=$2
|
||||
gsub(/["'\'']/, "", image)
|
||||
}
|
||||
service && /^ container_name:/ {
|
||||
container=$2
|
||||
gsub(/["'\'']/, "", container)
|
||||
}
|
||||
service && image && container {
|
||||
print container "\t" image
|
||||
service=""
|
||||
image=""
|
||||
container=""
|
||||
}
|
||||
' "$compose"
|
||||
}
|
||||
|
||||
while IFS= read -r -d '' compose; do
|
||||
while IFS="$(printf '\t')" read -r container expected_image; do
|
||||
[ -n "$container" ] || continue
|
||||
if ! runtime_image="$(docker inspect --format '{{.Config.Image}}' "$container" 2>/dev/null)"; then
|
||||
add_result "warning" "$container" "Container missing for compose image $expected_image ($compose)"
|
||||
continue
|
||||
fi
|
||||
if [ "$runtime_image" = "$expected_image" ]; then
|
||||
add_result "ok" "$container" "Runtime image matches $expected_image"
|
||||
else
|
||||
add_result "warning" "$container" "Runtime image '$runtime_image' differs from compose '$expected_image' ($compose)"
|
||||
fi
|
||||
done < <(parse_compose "$compose")
|
||||
done < <(find "$REPO_ROOT" -path "$REPO_ROOT/.git" -prune -o -type f \( -name docker-compose.yml -o -name docker-compose.yaml -o -name compose.yml -o -name compose.yaml \) -print0)
|
||||
|
||||
timestamp="$(date -Iseconds)"
|
||||
warning_count="$(awk -F '\t' '$1 == "warning" { count++ } END { print count + 0 }' "$RESULTS_FILE")"
|
||||
status="ok"
|
||||
[ "$warning_count" -gt 0 ] && status="warning"
|
||||
|
||||
mkdir -p "$(dirname "$OUTPUT_PATH")"
|
||||
{
|
||||
printf '{\n'
|
||||
printf ' "timestamp": "%s",\n' "$(printf '%s' "$timestamp" | json_escape)"
|
||||
printf ' "status": "%s",\n' "$status"
|
||||
printf ' "warning_count": %s,\n' "$warning_count"
|
||||
printf ' "checks": [\n'
|
||||
first=1
|
||||
while IFS="$(printf '\t')" read -r severity name message; do
|
||||
if [ "$first" -eq 0 ]; then printf ',\n'; fi
|
||||
first=0
|
||||
printf ' {"severity":"%s","name":"%s","message":"%s"}' \
|
||||
"$(printf '%s' "$severity" | json_escape)" \
|
||||
"$(printf '%s' "$name" | json_escape)" \
|
||||
"$(printf '%s' "$message" | json_escape)"
|
||||
done < "$RESULTS_FILE"
|
||||
printf '\n ]\n}\n'
|
||||
} > "$OUTPUT_PATH.tmp"
|
||||
mv "$OUTPUT_PATH.tmp" "$OUTPUT_PATH"
|
||||
cat "$OUTPUT_PATH"
|
||||
|
||||
if [ "$warning_count" -gt 0 ]; then
|
||||
if [ "$SEND_NTFY" = "1" ] && [ -x "$NTFY_SCRIPT" ]; then
|
||||
"$NTFY_SCRIPT" "$NTFY_TOPIC" "Compose/runtime drift detected" "$warning_count drift warning(s). See $OUTPUT_PATH" high || true
|
||||
fi
|
||||
exit 1
|
||||
fi
|
||||
Executable
+268
@@ -0,0 +1,268 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
OUTPUT_PATH="${OUTPUT_PATH:-/mnt/user/services/posture-check/last.json}"
|
||||
NTFY_BASE_URL="${NTFY_BASE_URL:-https://ntfy.sh}"
|
||||
WARNING_TOPIC="${WARNING_TOPIC:-kallilab-warning}"
|
||||
CRITICAL_TOPIC="${CRITICAL_TOPIC:-kallilab-critical}"
|
||||
SEND_NTFY="${SEND_NTFY:-1}"
|
||||
TMP_DIR="${TMP_DIR:-/tmp/kallilab-posture-check}"
|
||||
|
||||
mkdir -p "$TMP_DIR"
|
||||
RESULTS_FILE="$TMP_DIR/results.$$"
|
||||
: > "$RESULTS_FILE"
|
||||
|
||||
cleanup() {
|
||||
rm -f "$RESULTS_FILE"
|
||||
}
|
||||
trap cleanup EXIT
|
||||
|
||||
json_escape() {
|
||||
sed \
|
||||
-e 's/\\/\\\\/g' \
|
||||
-e 's/"/\\"/g' \
|
||||
-e 's/\t/\\t/g'
|
||||
}
|
||||
|
||||
add_result() {
|
||||
local severity="$1"
|
||||
local name="$2"
|
||||
local message="$3"
|
||||
printf '%s\t%s\t%s\n' "$severity" "$name" "$message" >> "$RESULTS_FILE"
|
||||
}
|
||||
|
||||
need_cmd() {
|
||||
if ! command -v "$1" >/dev/null 2>&1; then
|
||||
add_result "warning" "command_$1" "Command missing: $1"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_fstype() {
|
||||
local path="$1"
|
||||
local expected="$2"
|
||||
local severity="$3"
|
||||
local name="$4"
|
||||
local actual
|
||||
|
||||
if ! command -v findmnt >/dev/null 2>&1; then
|
||||
add_result "warning" "$name" "Cannot check $path filesystem because findmnt is missing"
|
||||
return
|
||||
fi
|
||||
|
||||
if ! actual="$(findmnt -no FSTYPE "$path" 2>/dev/null)"; then
|
||||
add_result "$severity" "$name" "Mount not found: $path"
|
||||
return
|
||||
fi
|
||||
|
||||
if [ "$actual" = "$expected" ]; then
|
||||
add_result "ok" "$name" "$path filesystem is $actual"
|
||||
else
|
||||
add_result "$severity" "$name" "$path filesystem is $actual, expected $expected"
|
||||
fi
|
||||
}
|
||||
|
||||
check_no_ntfs_on_core_mounts() {
|
||||
local hits
|
||||
|
||||
if ! command -v findmnt >/dev/null 2>&1; then
|
||||
add_result "warning" "no_ntfs_core_mounts" "Cannot check NTFS mounts because findmnt is missing"
|
||||
return
|
||||
fi
|
||||
|
||||
hits="$(findmnt -rn -o TARGET,FSTYPE 2>/dev/null | awk '$1 ~ "^/mnt/(cache|disk1)(/|$)" && ($2 == "ntfs3" || $2 == "fuseblk") { print $1 ":" $2 }' | paste -sd ',' -)"
|
||||
if [ -n "$hits" ]; then
|
||||
add_result "critical" "no_ntfs_core_mounts" "NTFS-like filesystem on core mount: $hits"
|
||||
else
|
||||
add_result "ok" "no_ntfs_core_mounts" "No ntfs3/fuseblk mounts below /mnt/cache or /mnt/disk1"
|
||||
fi
|
||||
}
|
||||
|
||||
check_mover_drift() {
|
||||
local path="/mnt/disk1/appdata"
|
||||
if [ ! -d "$path" ]; then
|
||||
add_result "ok" "mover_drift_appdata" "$path does not exist"
|
||||
return
|
||||
fi
|
||||
|
||||
if find "$path" -mindepth 1 -print -quit | grep -q .; then
|
||||
add_result "critical" "mover_drift_appdata" "$path contains entries; appdata should stay cache-only"
|
||||
else
|
||||
add_result "ok" "mover_drift_appdata" "$path is empty"
|
||||
fi
|
||||
}
|
||||
|
||||
check_inode_usage() {
|
||||
local path="$1"
|
||||
local max_percent="$2"
|
||||
local name="$3"
|
||||
local use_percent
|
||||
|
||||
if ! use_percent="$(df -Pi "$path" 2>/dev/null | awk 'NR==2 { gsub("%", "", $5); print $5 }')"; then
|
||||
add_result "warning" "$name" "Cannot read inode usage for $path"
|
||||
return
|
||||
fi
|
||||
|
||||
if [ "$use_percent" -lt "$max_percent" ]; then
|
||||
add_result "ok" "$name" "$path inode usage ${use_percent}%"
|
||||
else
|
||||
add_result "warning" "$name" "$path inode usage ${use_percent}% >= ${max_percent}%"
|
||||
fi
|
||||
}
|
||||
|
||||
check_filesystem_usage() {
|
||||
local path="$1"
|
||||
local max_percent="$2"
|
||||
local name="$3"
|
||||
local severity="$4"
|
||||
local use_percent
|
||||
|
||||
if ! use_percent="$(df -P "$path" 2>/dev/null | awk 'NR==2 { gsub("%", "", $5); print $5 }')"; then
|
||||
add_result "warning" "$name" "Cannot read filesystem usage for $path"
|
||||
return
|
||||
fi
|
||||
|
||||
if [ "$use_percent" -lt "$max_percent" ]; then
|
||||
add_result "ok" "$name" "$path usage ${use_percent}%"
|
||||
else
|
||||
add_result "$severity" "$name" "$path usage ${use_percent}% >= ${max_percent}%"
|
||||
fi
|
||||
}
|
||||
|
||||
check_nvme_smart() {
|
||||
local device="${NVME_DEVICE:-/dev/nvme0n1}"
|
||||
local smart
|
||||
local warning
|
||||
local percentage_used
|
||||
local media_errors
|
||||
|
||||
if ! need_cmd nvme; then
|
||||
return
|
||||
fi
|
||||
|
||||
if ! smart="$(nvme smart-log "$device" 2>/dev/null)"; then
|
||||
add_result "critical" "nvme_smart" "Cannot read nvme smart-log for $device"
|
||||
return
|
||||
fi
|
||||
|
||||
warning="$(printf '%s\n' "$smart" | awk -F: '/critical_warning/ { gsub(/[[:space:]]/, "", $2); print $2; exit }')"
|
||||
percentage_used="$(printf '%s\n' "$smart" | awk -F: '/percentage_used/ { gsub(/[^0-9]/, "", $2); print $2; exit }')"
|
||||
media_errors="$(printf '%s\n' "$smart" | awk -F: '/media_errors/ { gsub(/[^0-9]/, "", $2); print $2; exit }')"
|
||||
|
||||
if [ "${warning:-0}" = "0" ] || [ "${warning:-0}" = "0x00" ]; then
|
||||
add_result "ok" "nvme_critical_warning" "$device critical_warning ${warning:-0}"
|
||||
else
|
||||
add_result "critical" "nvme_critical_warning" "$device critical_warning ${warning}"
|
||||
fi
|
||||
|
||||
if [ -n "${percentage_used:-}" ] && [ "$percentage_used" -lt 80 ]; then
|
||||
add_result "ok" "nvme_percentage_used" "$device percentage_used ${percentage_used}%"
|
||||
else
|
||||
add_result "critical" "nvme_percentage_used" "$device percentage_used ${percentage_used:-unknown}, expected <80"
|
||||
fi
|
||||
|
||||
if [ "${media_errors:-0}" = "0" ]; then
|
||||
add_result "ok" "nvme_media_errors" "$device media_errors 0"
|
||||
else
|
||||
add_result "warning" "nvme_media_errors" "$device media_errors ${media_errors}"
|
||||
fi
|
||||
}
|
||||
|
||||
send_ntfy() {
|
||||
local severity="$1"
|
||||
local topic="$2"
|
||||
local body="$3"
|
||||
|
||||
if [ "$SEND_NTFY" != "1" ]; then
|
||||
return
|
||||
fi
|
||||
|
||||
if command -v curl >/dev/null 2>&1; then
|
||||
printf '%s\n' "$body" | curl -fsS \
|
||||
-H "Title: KalliLab posture-check $severity" \
|
||||
-H "Priority: high" \
|
||||
--data-binary @- \
|
||||
"$NTFY_BASE_URL/$topic" >/dev/null || true
|
||||
fi
|
||||
}
|
||||
|
||||
write_json() {
|
||||
local timestamp
|
||||
local critical_count
|
||||
local warning_count
|
||||
local status
|
||||
local first=1
|
||||
|
||||
timestamp="$(date -Iseconds)"
|
||||
critical_count="$(awk -F '\t' '$1 == "critical" { count++ } END { print count + 0 }' "$RESULTS_FILE")"
|
||||
warning_count="$(awk -F '\t' '$1 == "warning" { count++ } END { print count + 0 }' "$RESULTS_FILE")"
|
||||
|
||||
if [ "$critical_count" -gt 0 ]; then
|
||||
status="critical"
|
||||
elif [ "$warning_count" -gt 0 ]; then
|
||||
status="warning"
|
||||
else
|
||||
status="ok"
|
||||
fi
|
||||
|
||||
mkdir -p "$(dirname "$OUTPUT_PATH")"
|
||||
{
|
||||
printf '{\n'
|
||||
printf ' "timestamp": "%s",\n' "$(printf '%s' "$timestamp" | json_escape)"
|
||||
printf ' "status": "%s",\n' "$status"
|
||||
printf ' "critical_count": %s,\n' "$critical_count"
|
||||
printf ' "warning_count": %s,\n' "$warning_count"
|
||||
printf ' "checks": [\n'
|
||||
while IFS="$(printf '\t')" read -r severity name message; do
|
||||
if [ "$first" -eq 0 ]; then
|
||||
printf ',\n'
|
||||
fi
|
||||
first=0
|
||||
printf ' {"severity":"%s","name":"%s","message":"%s"}' \
|
||||
"$(printf '%s' "$severity" | json_escape)" \
|
||||
"$(printf '%s' "$name" | json_escape)" \
|
||||
"$(printf '%s' "$message" | json_escape)"
|
||||
done < "$RESULTS_FILE"
|
||||
printf '\n ]\n'
|
||||
printf '}\n'
|
||||
} > "$OUTPUT_PATH.tmp"
|
||||
mv "$OUTPUT_PATH.tmp" "$OUTPUT_PATH"
|
||||
|
||||
cat "$OUTPUT_PATH"
|
||||
|
||||
if [ "$status" = "critical" ]; then
|
||||
send_ntfy "critical" "$CRITICAL_TOPIC" "Posture-check critical: $critical_count critical, $warning_count warning. See $OUTPUT_PATH"
|
||||
return 2
|
||||
fi
|
||||
if [ "$status" = "warning" ]; then
|
||||
send_ntfy "warning" "$WARNING_TOPIC" "Posture-check warning: $warning_count warning. See $OUTPUT_PATH"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
main() {
|
||||
need_cmd findmnt || true
|
||||
need_cmd df || true
|
||||
need_cmd awk || true
|
||||
|
||||
check_fstype "/mnt/cache" "xfs" "critical" "cache_fstype"
|
||||
check_fstype "/mnt/disk1" "xfs" "critical" "disk1_fstype"
|
||||
check_no_ntfs_on_core_mounts
|
||||
check_mover_drift
|
||||
check_inode_usage "/mnt/cache" 80 "cache_inode_usage"
|
||||
check_inode_usage "/mnt/disk1" 80 "disk1_inode_usage"
|
||||
check_filesystem_usage "/mnt/cache" 70 "cache_fill_level" "warning"
|
||||
|
||||
for share in appdata system domains; do
|
||||
if [ -e "/mnt/user/$share" ]; then
|
||||
check_filesystem_usage "/mnt/user/$share" 70 "share_${share}_fill_level" "warning"
|
||||
else
|
||||
add_result "warning" "share_${share}_fill_level" "/mnt/user/$share missing"
|
||||
fi
|
||||
done
|
||||
|
||||
check_nvme_smart
|
||||
write_json
|
||||
}
|
||||
|
||||
main "$@"
|
||||
Executable
+5
@@ -0,0 +1,5 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
exec "$SCRIPT_DIR/posture-check.sh" "$@"
|
||||
Reference in New Issue
Block a user