Add self-hosted Healthchecks stack for internal job monitoring (hybrid)

Self-hosted Healthchecks (ops/healthchecks/) as the hub for internal
cron/job heartbeats. The three host-down/backup watchdogs (Borg pre-hook,
baerchen nearline pull, monitoring watchdog #8) deliberately stay on
healthchecks.io cloud, since an on-host watcher cannot report a host outage.

- frontend_net + dedicated PostgreSQL 18 in healthchecks_internal
- native Healthchecks auth; ping/API exempt from Authelia (n8n/Komodo pattern)
- registered as middleware_exempt in ops/policy-checks/exceptions.json
- docs: DECISIONS, ARCHITECTURE (3.1/4.2/7.6/10), SERVICE_CATALOG,
  SECRETS_MAP, MASTER_TODO, README index

docker compose config validated (exit 0). Not yet deployed: host secret file,
appdata dir, Komodo stack + ENV and Gitea webhook remain operator steps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-23 20:09:56 +02:00
parent ee0d450a27
commit cbfbb8ca4f
9 changed files with 291 additions and 0 deletions
+1
View File
@@ -27,6 +27,7 @@ Host-Reports (`/mnt/user/backups/restore-reports/`) und in der Git-Historie.
| Home Assistant Tibber | Operator/Codex | Tibber per HA-UI-Config-Flow verbinden. Danach Energy-Dashboard um echte Kosten/Preisquelle ergaenzen; SolarEdge-PV, Netz und Speicher sind bereits konfiguriert und validiert | `docs/runbooks/smart-home-bootstrap.md`, `docs/DECISIONS.md` |
| Nearline-Pull Dead-Man's-Switch | Operator | **S4U-Root-Cause 2026-06-21 behoben + verifiziert:** Task `KalliLab H Drive Nearline Pull` von S4U auf LogonType `Interactive` ("Nur wenn Benutzer angemeldet") umgestellt (kein Passwort noetig, da `michi` Dauer-Konsolen-User) -> per Planer mit `0x0` bestaetigt. Spiegel frisch, Exit-Code-Leak gefixt, Heartbeat-Pings gepusht. **Verbleibt (optional, niedrige Dringlichkeit):** je einen Healthchecks-Check anlegen + Capability-URL hinterlegen (baerchen ENV `HEALTHCHECKS_NEARLINE_URL`/Datei; Unraid `/mnt/user/appdata/secrets/healthchecks_borg_url`) | `ops/h-drive-nearline/README.md` |
| Monitoring Single-File-Bind-Mount Hardening | Operator/Claude | alertmanager/blackbox/loki/promtail + alertmanager-ntfy-bridge lokal auf Directory-Mounts umgestellt (grafana-provisioning war bereits Directory-Mount); `docker compose config` gruen. **Verbleibt:** Push + Komodo-Redeploy des monitoring-Stacks mit `--force-recreate` (Mount-Pfade aendern sich), danach Reload-/Alert-Smoke | `monitoring/docker-compose.yml` |
| Healthchecks self-hosted (interne Jobs) | Operator | Stack vorbereitet (`ops/healthchecks/`). Pre-Deploy: Appdata `/mnt/user/appdata/healthchecks/postgres18/` + Datei-Secret `healthchecks_postgres_password.txt` + 4 Komodo-Stack-ENV. Dann Komodo-Stack aus Gitea + Pflicht-Gitea-Webhook anlegen, danach interne Jobs (posture-check, restore-tests, Dumps) als Checks verdrahten. Externe Backup-/Host-down-Waechter bleiben auf healthchecks.io-Cloud | `ops/healthchecks/README.md` |
---