Backup-Audit-Hardening: Dump-Frische-Monitoring und Scope-Konsistenz
Findings aus dem Backup-/Restore-Audit 2026-06-18 umgesetzt: - Dump-Frische als Prometheus-Metrik (homelab_borg_dump_present / homelab_borg_dump_age_seconds) im Host-Exporter; schliesst den Blindfleck, dass Borg weiterlaeuft und stale Dumps archiviert, ohne Job-Fehler. - Neue Alerts HomelabBorgDumpMissing / HomelabBorgDumpStale (critical) plus ALERT_RULES.md. - Freshness-Gate (.sh + .ps1) und H:-Nearline-Pull um n8n.sqlite.dump und postgresql17-globals.sql ergaenzt. - Critical-Container-Watch um mail-archiver, n8n, homeassistant, smarthome-mosquitto erweitert. - BACKUP_SCOPE: /mnt/user/projekte und sonstige User-Shares ausserhalb App-Scope als bewusste offene Operator-Entscheidung dokumentiert; Hermes-data-Pfad als geparkt klargestellt. - MASTER_TODO: Nearline-Pull-Ueberwachung, Host-Pull-Nachzug und projekte-Scope-Entscheidung aufgenommen. Enthaelt ausserdem die zuvor vorbereiteten Scope-Erweiterungen (nextcloud html+data, n8n, filebrowser, influxdb3) und Scope-Drift-/ Retention-/Compact-/Check-Alerts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
+9
-1
@@ -1,6 +1,6 @@
|
|||||||
# Alert Rules
|
# Alert Rules
|
||||||
|
|
||||||
Stand: 2026-06-05
|
Stand: 2026-06-18
|
||||||
|
|
||||||
Diese Datei beschreibt die produktiven Alarmwege und wichtigsten Regeln. Die
|
Diese Datei beschreibt die produktiven Alarmwege und wichtigsten Regeln. Die
|
||||||
Konfiguration selbst liegt in `monitoring/prometheus/alerts.yml` und in den
|
Konfiguration selbst liegt in `monitoring/prometheus/alerts.yml` und in den
|
||||||
@@ -36,6 +36,14 @@ Skripten unter `services/posture-check/`.
|
|||||||
| `HomelabBorgBackupStale` | letztes Borg-Backup >30h | warning | Backup-Lauf nachholen/pruefen |
|
| `HomelabBorgBackupStale` | letztes Borg-Backup >30h | warning | Backup-Lauf nachholen/pruefen |
|
||||||
| `HomelabBorgLastJobFailed` | letzter Borg-Job fehlgeschlagen | critical | Borg-UI-Job-Log pruefen |
|
| `HomelabBorgLastJobFailed` | letzter Borg-Job fehlgeschlagen | critical | Borg-UI-Job-Log pruefen |
|
||||||
| `HomelabBorgLastJobCompletedWithWarnings` | letzter Borg-Job mit Warnungen | warning | Warnung im Borg-UI-Job lesen |
|
| `HomelabBorgLastJobCompletedWithWarnings` | letzter Borg-Job mit Warnungen | warning | Warnung im Borg-UI-Job lesen |
|
||||||
|
| `HomelabBorgDumpMissing` | erwartetes Dump-Artefakt fehlt im aktuellen Dump-Set | critical | `pre-backup-dumps.sh`/User-Script pruefen |
|
||||||
|
| `HomelabBorgDumpStale` | Dump-Artefakt >30h alt (Borg laeuft, Dumps eingefroren) | critical | `pre-backup-dumps.sh`/User-Script pruefen, nicht nur den Borg-Job |
|
||||||
|
| `HomelabBorgScopeSourceListMissing` | Repo-Quellliste fuer Borg-Drift-Check fehlt | critical | Borg-UI-Mount `/local/services/homelab-infra` und Repo-Pfad pruefen |
|
||||||
|
| `HomelabBorgScopeMissingSources` | Borg UI enthaelt nicht alle Pfade aus `ops/borg-ui/all-important-sources.txt` | critical | Live-Borg-Scope an Repo-Quelle angleichen |
|
||||||
|
| `HomelabBorgScopeExtraSources` | Borg UI enthaelt Pfade ausserhalb der Repo-Quellliste | warning | Doku oder Live-Scope bereinigen |
|
||||||
|
| `HomelabBorgRepositoryCheckStale` | letzter Borg-Check >14 Tage alt | warning | Borg-Repository-Check ausfuehren oder Scheduler pruefen |
|
||||||
|
| `HomelabBorgRetentionDisabled` | Scheduled Job fuehrt kein Prune aus | warning | Retention-Einstellung in Borg UI pruefen |
|
||||||
|
| `HomelabBorgCompactDisabled` | Scheduled Job fuehrt kein Compact aus | warning | Compact-Einstellung in Borg UI pruefen |
|
||||||
| `HomelabCriticalContainerDown` | kritischer Container fehlt | critical | Komodo/Docker-Status pruefen |
|
| `HomelabCriticalContainerDown` | kritischer Container fehlt | critical | Komodo/Docker-Status pruefen |
|
||||||
| `HomelabPrometheusTargetDown` | Scrape-Ziel down | critical | node-exporter/cadvisor/blackbox/traefik pruefen |
|
| `HomelabPrometheusTargetDown` | Scrape-Ziel down | critical | node-exporter/cadvisor/blackbox/traefik pruefen |
|
||||||
|
|
||||||
|
|||||||
+9
-3
@@ -1,6 +1,6 @@
|
|||||||
# Master To-do - KalliLab CORE
|
# Master To-do - KalliLab CORE
|
||||||
|
|
||||||
Typ: Status/To-do · Stand: 2026-06-17 · Status: aktiv
|
Typ: Status/To-do · Stand: 2026-06-18 · Status: aktiv
|
||||||
|
|
||||||
Diese Liste ist die **einzige** Arbeitsliste fuer offene operative Punkte im
|
Diese Liste ist die **einzige** Arbeitsliste fuer offene operative Punkte im
|
||||||
Homelab. Detailablaeufe stehen in den verlinkten Runbooks; Entscheidungen mit
|
Homelab. Detailablaeufe stehen in den verlinkten Runbooks; Entscheidungen mit
|
||||||
@@ -25,14 +25,19 @@ Host-Reports (`/mnt/user/backups/restore-reports/`) und in der Git-Historie.
|
|||||||
| Restore-Test Tailscale | Operator | State-Validierung + Reconnect nur auf Wegwerf-Host/VM, danach Geraet in Tailscale-Admin entfernen | `ops/restore-tests/tailscale-runbook.md` |
|
| Restore-Test Tailscale | Operator | State-Validierung + Reconnect nur auf Wegwerf-Host/VM, danach Geraet in Tailscale-Admin entfernen | `ops/restore-tests/tailscale-runbook.md` |
|
||||||
| Authelia OIDC fuer Apps | Operator/Codex | Live: Grafana + Mealie login-verifiziert; Paperless Secret verdrahtet und Service-Smoke am 2026-06-17 gruen, finaler Browser-Login mit Operator-Account offen. Immich + Nextcloud bewusst geparkt bis Family-Onboarding (siehe `docs/DECISIONS.md` 2026-06-06) | `docs/AUTHELIA_OIDC_PLAN.md` |
|
| Authelia OIDC fuer Apps | Operator/Codex | Live: Grafana + Mealie login-verifiziert; Paperless Secret verdrahtet und Service-Smoke am 2026-06-17 gruen, finaler Browser-Login mit Operator-Account offen. Immich + Nextcloud bewusst geparkt bis Family-Onboarding (siehe `docs/DECISIONS.md` 2026-06-06) | `docs/AUTHELIA_OIDC_PLAN.md` |
|
||||||
| Home Assistant Tibber | Operator/Codex | Tibber per HA-UI-Config-Flow verbinden. Danach Energy-Dashboard um echte Kosten/Preisquelle ergaenzen; SolarEdge-PV, Netz und Speicher sind bereits konfiguriert und validiert | `docs/runbooks/smart-home-bootstrap.md`, `docs/DECISIONS.md` |
|
| Home Assistant Tibber | Operator/Codex | Tibber per HA-UI-Config-Flow verbinden. Danach Energy-Dashboard um echte Kosten/Preisquelle ergaenzen; SolarEdge-PV, Netz und Speicher sind bereits konfiguriert und validiert | `docs/runbooks/smart-home-bootstrap.md`, `docs/DECISIONS.md` |
|
||||||
|
| Nearline-Pull Ueberwachung | Operator | H:-Pull war 2026-06-04 bis 2026-06-18 still gestoppt (kein Scheduled Task, kein Alarm). Am 2026-06-18 Lauf manuell nachgeholt + Task neu registriert. **Naechster Schritt:** externen Dead-Man's-Switch (Healthchecks.io-Ping am Ende von `pull-critical-backups.ps1` und `ops/borg-ui/scripts/pre-borg.sh`), da Prometheus auf Unraid den baerchen-Pull nicht sieht | `ops/h-drive-nearline/README.md` |
|
||||||
|
| Host-Pull nach Backup-Hardening | Operator | Auf `/mnt/user/services/homelab-infra` `git pull`, damit der aktualisierte `export-prometheus-textfile.sh` (Dump-Frische-Metriken) und die Freshness-Checks live greifen. Borg-UI-Live-Quellen auf neue Pfade (nextcloud/html, nextcloud/data, n8n, filebrowser, influxdb3) angleichen, bis `homelab_borg_scope_missing_sources_total` 0 ist | `services/posture-check/export-prometheus-textfile.sh`, `ops/borg-ui/all-important-sources.txt` |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Operator-Entscheidung
|
## Operator-Entscheidung
|
||||||
|
|
||||||
**Stand 2026-06-11: keine offenen Operator-Entscheidungen.**
|
|
||||||
Getroffene Entscheidungen mit Begruendung und Review-Trigger: `docs/DECISIONS.md`.
|
Getroffene Entscheidungen mit Begruendung und Review-Trigger: `docs/DECISIONS.md`.
|
||||||
|
|
||||||
|
| Thema | Entscheidung noetig | Quelle |
|
||||||
|
|---|---|---|
|
||||||
|
| `/mnt/user/projekte` Backup-Scope | Filebrowser serviert `projekte` (und ganze `documents`/`photos`), aber nur App-Unterordner sind im Borg-Scope. Entscheiden: `projekte` als read-only Borg-UI-Mount + Quelllisten-Eintrag aufnehmen, oder bewusst als "nur lokal, nicht DR-relevant" bestaetigen | `ops/borg-ui/BACKUP_SCOPE.md` Abschnitt "User-Daten-Shares ausserhalb des App-Scope" |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Geparkt
|
## Geparkt
|
||||||
@@ -52,6 +57,7 @@ Bewusst nicht jetzt - Begruendungen in `docs/DECISIONS.md`, hier nur Thema und T
|
|||||||
| Filebrowser-Mount-Scope | naechster Hardening-Sprint | `docs/SERVICE_CATALOG.md` |
|
| Filebrowser-Mount-Scope | naechster Hardening-Sprint | `docs/SERVICE_CATALOG.md` |
|
||||||
| Scrutiny Privileged-Ausnahme | nur mit klarer Begruendung aendern | `docs/SERVICE_CATALOG.md` |
|
| Scrutiny Privileged-Ausnahme | nur mit klarer Begruendung aendern | `docs/SERVICE_CATALOG.md` |
|
||||||
| Immich Redis named volume | passende Wartung am Immich-Stack | `docs/SERVICE_CATALOG.md` |
|
| Immich Redis named volume | passende Wartung am Immich-Stack | `docs/SERVICE_CATALOG.md` |
|
||||||
|
| Komodo keys named volume | gemeinsames Wartungsfenster mit Operator | Live-Volume `komodo_komodo_keys` nach `/mnt/user/appdata/komodo/keys` migrieren, Compose anpassen, Periphery-Reconnect pruefen, dann in Borg-Scope aufnehmen |
|
||||||
| Storage-Wachstum (zweite NVMe, zweite Array-Disk, ZFS/BTRFS) | Trigger aus Capacity-Doku | `docs/STORAGE_LAYOUT.md`, `docs/CAPACITY_AND_LIFECYCLE.md` |
|
| Storage-Wachstum (zweite NVMe, zweite Array-Disk, ZFS/BTRFS) | Trigger aus Capacity-Doku | `docs/STORAGE_LAYOUT.md`, `docs/CAPACITY_AND_LIFECYCLE.md` |
|
||||||
| Wiederkehrende Restore-Drills | laufend nach Kadenz, inkl. quartalsweisem Frische-Negativtest (`run-restore-checks.sh freshness-negative`) | `docs/RESTORE_MATRIX.md`, `ops/restore-tests/schedule.md` |
|
| Wiederkehrende Restore-Drills | laufend nach Kadenz, inkl. quartalsweisem Frische-Negativtest (`run-restore-checks.sh freshness-negative`) | `docs/RESTORE_MATRIX.md`, `ops/restore-tests/schedule.md` |
|
||||||
| Doku-Quartals-Gaertnern (~15 min) | quartalsweise, erster Lauf mit Q3-Review ab 2026-07-01: Datiertes archivieren, Done-/Review-Logs kuerzen, tote Links pruefen | `docs/REPO_MAP.md` Doku-Regeln |
|
| Doku-Quartals-Gaertnern (~15 min) | quartalsweise, erster Lauf mit Q3-Review ab 2026-07-01: Datiertes archivieren, Done-/Review-Logs kuerzen, tote Links pruefen | `docs/REPO_MAP.md` Doku-Regeln |
|
||||||
@@ -71,8 +77,8 @@ Bewusst nicht jetzt - Begruendungen in `docs/DECISIONS.md`, hier nur Thema und T
|
|||||||
- **2026-06-17** Offene TODOs gegen Live-Stand abgeglichen: Paperless-OIDC-Secret verdrahtet und Service-Smoke gruen; alter Tailscale-Docker-State nach `_archive/tailscale-removed-2026-06-06/` verschoben; Tailnet-Restpunkt geschlossen.
|
- **2026-06-17** Offene TODOs gegen Live-Stand abgeglichen: Paperless-OIDC-Secret verdrahtet und Service-Smoke gruen; alter Tailscale-Docker-State nach `_archive/tailscale-removed-2026-06-06/` verschoben; Tailnet-Restpunkt geschlossen.
|
||||||
- **2026-06-17** Repo-Hygiene abgeschlossen: Glance-Widget-Tokens sind in Runtime gesetzt, Audit-PDF liegt extern unter `H:\kallilab-recovery\audits`, Worktree clean.
|
- **2026-06-17** Repo-Hygiene abgeschlossen: Glance-Widget-Tokens sind in Runtime gesetzt, Audit-PDF liegt extern unter `H:\kallilab-recovery\audits`, Worktree clean.
|
||||||
- **2026-06-17** Komodo/Gitea-Webhooks normalisiert: aktive Komodo-Hooks fuer `Micha/homelab-infra` nutzen Branch-Filter `master`; DB-Backup vor Host-Hotfix erstellt. Workflow-Regel nachgezogen.
|
- **2026-06-17** Komodo/Gitea-Webhooks normalisiert: aktive Komodo-Hooks fuer `Micha/homelab-infra` nutzen Branch-Filter `master`; DB-Backup vor Host-Hotfix erstellt. Workflow-Regel nachgezogen.
|
||||||
|
- **2026-06-18** Backup-Audit-Hardening: Dump-Frische-Metriken + Alerts `HomelabBorgDumpMissing/Stale`, Freshness-Checks + Nearline-Pull um `n8n`/`globals` ergaenzt, 4 Tier-2-Container in Critical-Watch, Scope-Doku fuer `projekte`/Hermes praezisiert. H:-Nearline (still seit 2026-06-04) nachgeholt + Task neu registriert.
|
||||||
- **2026-06-13** Home Assistant MQTT-Integration produktiv verbunden: Config-Entry `smarthome-mosquitto` ist `loaded`, Mosquitto sieht den HA-Client `homeassistant`; `check_config` gruen.
|
- **2026-06-13** Home Assistant MQTT-Integration produktiv verbunden: Config-Entry `smarthome-mosquitto` ist `loaded`, Mosquitto sieht den HA-Client `homeassistant`; `check_config` gruen.
|
||||||
- **2026-06-13** HA Energy Dashboard konfiguriert: Netz, PV und Speicher aus SolarEdge Local gesetzt, `energy/validate` ohne Issues; HA-Backup danach erzeugt.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -60,6 +60,7 @@ Sie ist die fachliche Ergaenzung zu `docs/DISASTER_RECOVERY.md`.
|
|||||||
| Glance | Git / Borg-Repo | Repo-Konfiguration unter `ops/glance/config/glance.yml`; keine kritische Datenpersistenz | keine | `GLANCE_IMMICH_API_KEY`, `GLANCE_ADGUARD_USERNAME`, `GLANCE_ADGUARD_PASSWORD`, `GLANCE_SPEEDTEST_API_KEY` | Traefik, Authelia, optional interne API-Ziele | Dashboard startet, Widgets laden, Docker-Status laeuft nur ueber `glance-docker-socket-proxy` |
|
| Glance | Git / Borg-Repo | Repo-Konfiguration unter `ops/glance/config/glance.yml`; keine kritische Datenpersistenz | keine | `GLANCE_IMMICH_API_KEY`, `GLANCE_ADGUARD_USERNAME`, `GLANCE_ADGUARD_PASSWORD`, `GLANCE_SPEEDTEST_API_KEY` | Traefik, Authelia, optional interne API-Ziele | Dashboard startet, Widgets laden, Docker-Status laeuft nur ueber `glance-docker-socket-proxy` |
|
||||||
| ntfy | Borg / Share | `/mnt/user/appdata/ntfy` | keine | keine besonderen Secret-Dateien dokumentiert | Traefik | UI und Push-Endpunkt erreichbar |
|
| ntfy | Borg / Share | `/mnt/user/appdata/ntfy` | keine | keine besonderen Secret-Dateien dokumentiert | Traefik | UI und Push-Endpunkt erreichbar |
|
||||||
| Paperless-GPT | Borg / Share | `/mnt/user/appdata/paperless-gpt` | keine eigene DB | `PAPERLESS_API_TOKEN`, `OPENAI_API_KEY` | Traefik, Paperless, OpenAI API | UI startet, Konfiguration vorhanden; LLM-Provider zeigt `openai` / `gpt-5.4-mini` |
|
| Paperless-GPT | Borg / Share | `/mnt/user/appdata/paperless-gpt` | keine eigene DB | `PAPERLESS_API_TOKEN`, `OPENAI_API_KEY` | Traefik, Paperless, OpenAI API | UI startet, Konfiguration vorhanden; LLM-Provider zeigt `openai` / `gpt-5.4-mini` |
|
||||||
|
| n8n | Borg + Dump | `/mnt/user/appdata/n8n/data` | `n8n.sqlite.dump`; Credentials sind nur mit dem passenden `N8N_ENCRYPTION_KEY` entschluesselbar | `N8N_ENCRYPTION_KEY`, GMX/OpenAI/Gitea-Credentials in n8n | Traefik, GMX IMAP, OpenAI API, Gitea API | UI startet, Owner-Login funktioniert, kritischer Mail->LLM->Gitea-Workflow ist vorhanden und deaktiviert/aktiv wie vor Restore |
|
||||||
| Home Assistant | Borg + HA-native Backups + Fachrepo | `/mnt/user/appdata/homeassistant` inkl. `.storage`, `secrets.yaml`, `trusted_proxies.yaml`, `custom_components` (HACS, `solaredge_modbus_multi`); Fach-YAML aus `/mnt/user/services/smart-home-kalli/home-assistant` | HA-native Backup-Artefakte unter `/mnt/user/appdata/homeassistant/backups`; erstes Artefakt 2026-06-13 erzeugt und tar-lesbar (`backup.json`, `homeassistant.tar.gz`); Backup nach SolarEdge-Integration: `Custom_backup_2026.6.1_2026-06-13_14.59_48645373.tar`; Backup nach Energy-Dashboard-Konfiguration: `Custom_backup_2026.6.1_2026-06-13_15.59_25670583.tar`; keine externe DB in Phase 1 | HA-Secrets in `secrets.yaml`, Integrations-Tokens in `.storage`, MQTT-Credentials, Agent-API-Tokens als Host-Secrets `ha_token_codex`/`ha_token_claude` (nur mit erhaltenem `.storage`-Auth-State nutzbar), spaeter Tibber/InfluxDB-Tokens | Traefik, `frontend_net`, `smarthome_net`, Mosquitto, Fachrepo-Clone, SolarEdge-Wechselrichter `192.168.178.111:1502` | Restore-Test am 2026-06-13 erfolgreich: HA-native Backup + Mosquitto-Appdata + Fachrepo-Clone isoliert gestartet, HA HTTP/API/check_config gruen; produktiv danach HA-MQTT-Config-Entry `smarthome-mosquitto` geladen, SolarEdge Local `solaredge_modbus_multi` loaded mit 68 Entitaeten und Energy Dashboard fuer Netz/PV/Speicher per `energy/validate` ohne Issues; Report `/mnt/user/backups/restore-reports/homeassistant-2026-06-13.md` |
|
| Home Assistant | Borg + HA-native Backups + Fachrepo | `/mnt/user/appdata/homeassistant` inkl. `.storage`, `secrets.yaml`, `trusted_proxies.yaml`, `custom_components` (HACS, `solaredge_modbus_multi`); Fach-YAML aus `/mnt/user/services/smart-home-kalli/home-assistant` | HA-native Backup-Artefakte unter `/mnt/user/appdata/homeassistant/backups`; erstes Artefakt 2026-06-13 erzeugt und tar-lesbar (`backup.json`, `homeassistant.tar.gz`); Backup nach SolarEdge-Integration: `Custom_backup_2026.6.1_2026-06-13_14.59_48645373.tar`; Backup nach Energy-Dashboard-Konfiguration: `Custom_backup_2026.6.1_2026-06-13_15.59_25670583.tar`; keine externe DB in Phase 1 | HA-Secrets in `secrets.yaml`, Integrations-Tokens in `.storage`, MQTT-Credentials, Agent-API-Tokens als Host-Secrets `ha_token_codex`/`ha_token_claude` (nur mit erhaltenem `.storage`-Auth-State nutzbar), spaeter Tibber/InfluxDB-Tokens | Traefik, `frontend_net`, `smarthome_net`, Mosquitto, Fachrepo-Clone, SolarEdge-Wechselrichter `192.168.178.111:1502` | Restore-Test am 2026-06-13 erfolgreich: HA-native Backup + Mosquitto-Appdata + Fachrepo-Clone isoliert gestartet, HA HTTP/API/check_config gruen; produktiv danach HA-MQTT-Config-Entry `smarthome-mosquitto` geladen, SolarEdge Local `solaredge_modbus_multi` loaded mit 68 Entitaeten und Energy Dashboard fuer Netz/PV/Speicher per `energy/validate` ohne Issues; Report `/mnt/user/backups/restore-reports/homeassistant-2026-06-13.md` |
|
||||||
| Smart-Home MQTT / Mosquitto | Borg / Share | `/mnt/user/appdata/mosquitto/config`, `/mnt/user/appdata/mosquitto/data`, `/mnt/user/appdata/mosquitto/log` | Mosquitto persistiert retained messages/subscriptions dateibasiert | `passwordfile`, `aclfile`, spaeter per-Device-User | `smarthome_net`, Home Assistant, spaeter ESPHome/Zigbee2MQTT | Restore-Test am 2026-06-13 erfolgreich: authentifizierter Publish/Subscribe-Smoke mit `homeassistant`-User und retained Topic nach Broker-Restart gruen; produktiv verbindet sich HA als User `homeassistant` |
|
| Smart-Home MQTT / Mosquitto | Borg / Share | `/mnt/user/appdata/mosquitto/config`, `/mnt/user/appdata/mosquitto/data`, `/mnt/user/appdata/mosquitto/log` | Mosquitto persistiert retained messages/subscriptions dateibasiert | `passwordfile`, `aclfile`, spaeter per-Device-User | `smarthome_net`, Home Assistant, spaeter ESPHome/Zigbee2MQTT | Restore-Test am 2026-06-13 erfolgreich: authentifizierter Publish/Subscribe-Smoke mit `homeassistant`-User und retained Topic nach Broker-Restart gruen; produktiv verbindet sich HA als User `homeassistant` |
|
||||||
| Smart-Home Fachrepo | Gitea + Borg-Repo-Clone | `/mnt/user/services/smart-home-kalli` | keine | keine echten Secrets im Repo; `secrets-template/` nur Beispiele | Gitea, Home Assistant Mounts | `git status` sauber, HA liest `configuration.yaml` und `packages/` aus dem Clone |
|
| Smart-Home Fachrepo | Gitea + Borg-Repo-Clone | `/mnt/user/services/smart-home-kalli` | keine | keine echten Secrets im Repo; `secrets-template/` nur Beispiele | Gitea, Home Assistant Mounts | `git status` sauber, HA liest `configuration.yaml` und `packages/` aus dem Clone |
|
||||||
@@ -104,6 +105,7 @@ Aktuell relevante Dump-Artefakte unter `/mnt/user/backups/borg/dumps/latest`:
|
|||||||
- `filebrowser.bolt.dump`
|
- `filebrowser.bolt.dump`
|
||||||
- `borg-ui.sqlite`
|
- `borg-ui.sqlite`
|
||||||
- `grafana.sqlite`
|
- `grafana.sqlite`
|
||||||
|
- `n8n.sqlite.dump`
|
||||||
- `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256` und Manifest
|
- `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256` und Manifest
|
||||||
- Monitoring-Stack: keine verpflichtenden Dump-Artefakte; Prometheus/Loki/Grafana named volumes sind Diagnose-/Dashboard-Zustand, keine primaere Restore-Quelle.
|
- Monitoring-Stack: keine verpflichtenden Dump-Artefakte; Prometheus/Loki/Grafana named volumes sind Diagnose-/Dashboard-Zustand, keine primaere Restore-Quelle.
|
||||||
- `komodo-mongo.archive.gz` (noch gesondert verifizieren)
|
- `komodo-mongo.archive.gz` (noch gesondert verifizieren)
|
||||||
|
|||||||
@@ -131,6 +131,78 @@ groups:
|
|||||||
summary: "Latest Borg backup completed with warnings"
|
summary: "Latest Borg backup completed with warnings"
|
||||||
description: "The latest Borg UI job completed with warnings for archive {{ $labels.archive }}."
|
description: "The latest Borg UI job completed with warnings for archive {{ $labels.archive }}."
|
||||||
|
|
||||||
|
- alert: HomelabBorgScopeSourceListMissing
|
||||||
|
expr: homelab_borg_scope_expected_file_present != 1
|
||||||
|
for: 15m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: "Borg expected source list is not visible"
|
||||||
|
description: "Borg UI cannot see the repo source list used for drift checks."
|
||||||
|
|
||||||
|
- alert: HomelabBorgScopeMissingSources
|
||||||
|
expr: homelab_borg_scope_missing_sources_total > 0
|
||||||
|
for: 15m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: "Borg UI is missing expected backup sources"
|
||||||
|
description: "Borg UI is missing {{ $value }} source path(s) from ops/borg-ui/all-important-sources.txt."
|
||||||
|
|
||||||
|
- alert: HomelabBorgScopeExtraSources
|
||||||
|
expr: homelab_borg_scope_extra_sources_total > 0
|
||||||
|
for: 30m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Borg UI has sources not tracked in the repo"
|
||||||
|
description: "Borg UI has {{ $value }} source path(s) that are not listed in ops/borg-ui/all-important-sources.txt."
|
||||||
|
|
||||||
|
- alert: HomelabBorgDumpMissing
|
||||||
|
expr: homelab_borg_dump_present == 0
|
||||||
|
for: 15m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: "Borg pre-backup dump is missing: {{ $labels.dump }}"
|
||||||
|
description: "Expected dump artifact {{ $labels.dump }} is not present in the latest dump set. The pre-backup dump job may have failed or stopped."
|
||||||
|
|
||||||
|
- alert: HomelabBorgDumpStale
|
||||||
|
expr: homelab_borg_dump_age_seconds > 30 * 60 * 60
|
||||||
|
for: 15m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: "Borg pre-backup dump is stale: {{ $labels.dump }}"
|
||||||
|
description: "Dump artifact {{ $labels.dump }} is older than 30 hours. pre-backup-dumps.sh may have stopped; Borg would keep archiving stale database content without a job failure."
|
||||||
|
|
||||||
|
- alert: HomelabBorgRepositoryCheckStale
|
||||||
|
expr: time() - homelab_borg_repository_last_check_timestamp_seconds > 14 * 24 * 60 * 60
|
||||||
|
for: 30m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Borg repository check is stale"
|
||||||
|
description: "Borg repository {{ $labels.repository }} has not had a recorded check for more than 14 days."
|
||||||
|
|
||||||
|
- alert: HomelabBorgRetentionDisabled
|
||||||
|
expr: homelab_borg_schedule_prune_after_enabled != 1
|
||||||
|
for: 30m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Borg retention pruning is disabled"
|
||||||
|
description: "Scheduled Borg job {{ $labels.schedule }} does not run prune after backup."
|
||||||
|
|
||||||
|
- alert: HomelabBorgCompactDisabled
|
||||||
|
expr: homelab_borg_schedule_compact_after_enabled != 1
|
||||||
|
for: 30m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Borg compaction is disabled"
|
||||||
|
description: "Scheduled Borg job {{ $labels.schedule }} does not run compact after backup."
|
||||||
|
|
||||||
- alert: HomelabCriticalContainerDown
|
- alert: HomelabCriticalContainerDown
|
||||||
expr: homelab_critical_container_running == 0
|
expr: homelab_critical_container_running == 0
|
||||||
for: 5m
|
for: 5m
|
||||||
|
|||||||
@@ -48,11 +48,12 @@ The Unraid flash configuration archive is intentional as well and must be treate
|
|||||||
| Grafana | SQLite dump from `monitoring_grafana_data` + provisioned config in Git | `/local/borg-dumps`, `monitoring/grafana/provisioning`, `monitoring/grafana/dashboards` |
|
| Grafana | SQLite dump from `monitoring_grafana_data` + provisioned config in Git | `/local/borg-dumps`, `monitoring/grafana/provisioning`, `monitoring/grafana/dashboards` |
|
||||||
| Filebrowser | file-backed state dump + file data | `/local/borg-dumps`, `/local/appdata/filebrowser` |
|
| Filebrowser | file-backed state dump + file data | `/local/borg-dumps`, `/local/appdata/filebrowser` |
|
||||||
| InfluxDB 3 Core | file data | `/local/appdata/influxdb3/data`, `/local/appdata/influxdb3/plugins` |
|
| InfluxDB 3 Core | file data | `/local/appdata/influxdb3/data`, `/local/appdata/influxdb3/plugins` |
|
||||||
|
| n8n | SQLite dump + encrypted workflow/credential state | `/local/borg-dumps`, `/local/appdata/n8n/data` |
|
||||||
| Home Assistant | HA-native backup + file state | `/local/appdata/homeassistant`, `/local/services/smart-home-kalli` |
|
| Home Assistant | HA-native backup + file state | `/local/appdata/homeassistant`, `/local/services/smart-home-kalli` |
|
||||||
| Smart-Home MQTT / Mosquitto | file data | `/local/appdata/mosquitto/config`, `/local/appdata/mosquitto/data` |
|
| Smart-Home MQTT / Mosquitto | file data | `/local/appdata/mosquitto/config`, `/local/appdata/mosquitto/data` |
|
||||||
| Zigbee2MQTT (planned) | file data + coordinator state | `/local/appdata/zigbee2mqtt`, `/local/services/smart-home-kalli` |
|
| Zigbee2MQTT (planned) | file data + coordinator state | `/local/appdata/zigbee2mqtt`, `/local/services/smart-home-kalli` |
|
||||||
| ESPHome (planned) | Fachrepo + optional build/runtime cache | `/local/services/smart-home-kalli/esphome`, optional `/local/appdata/esphome` |
|
| ESPHome (planned) | Fachrepo + optional build/runtime cache | `/local/services/smart-home-kalli/esphome`, optional `/local/appdata/esphome` |
|
||||||
| Hermes Agent | file data + SSH key | `/local/appdata/hermes-agent/data`, `/local/secrets/hermes_runner_id_ed25519` |
|
| Hermes Agent | file data + SSH key | SSH-Key via `/local/secrets`; `/local/appdata/hermes-agent/data` ist bewusst NICHT in `all-important-sources.txt`, weil der Stack geparkt ist (Review 2026-07-25). Beim Aktivieren des Stacks in die Quellliste aufnehmen. |
|
||||||
| BentoPDF | rebuildable | no critical persistence in compose |
|
| BentoPDF | rebuildable | no critical persistence in compose |
|
||||||
|
|
||||||
## Open Decisions and Coverage Gaps
|
## Open Decisions and Coverage Gaps
|
||||||
@@ -71,6 +72,17 @@ Option A umgesetzt: `pre-backup-dumps.sh` writes `nextcloud.dump` from `nextclou
|
|||||||
|
|
||||||
The live Unraid User Scripts execute repo scripts from `/mnt/user/services/homelab-infra`, while Komodo keeps stack workspaces below `/mnt/user/services/stacks`. These paths are now mounted into Borg UI as `/local/services/...` and included explicitly so host-side script hotfixes, stack workspace state, and posture-check state are recoverable.
|
The live Unraid User Scripts execute repo scripts from `/mnt/user/services/homelab-infra`, while Komodo keeps stack workspaces below `/mnt/user/services/stacks`. These paths are now mounted into Borg UI as `/local/services/...` and included explicitly so host-side script hotfixes, stack workspace state, and posture-check state are recoverable.
|
||||||
|
|
||||||
|
### User-Daten-Shares ausserhalb des App-Scope
|
||||||
|
|
||||||
|
Filebrowser serviert `/mnt/user/projekte`, `/mnt/user/documents` und `/mnt/user/photos` komplett (`ops/filebrowser/docker-compose.yml`). Der Borg-Scope deckt aber bewusst nur die App-Unterordner ab (`documents/paperless*`, `documents/nextcloud-data`, `documents/scans_inbox`, `photos/immich`, `photos/family_archive`).
|
||||||
|
|
||||||
|
- **`/mnt/user/projekte`** ist aktuell in **keinem** Borg-Scope. Ad-hoc-Dateien, die direkt unter `documents/` oder `photos/` (ausserhalb der genannten App-Ordner) abgelegt werden, ebenfalls nicht.
|
||||||
|
- Entscheidung Operator offen (Eintrag in `docs/MASTER_TODO.md`): Entweder `projekte` als eigenen read-only Borg-UI-Mount + Quelllisten-Eintrag aufnehmen, oder bewusst als "nur lokal, nicht DR-relevant" bestaetigen. Bis zur Entscheidung gilt: dort liegende Originaldaten sind **nicht** wiederherstellbar.
|
||||||
|
|
||||||
|
### Komodo keys
|
||||||
|
|
||||||
|
Production still stores Komodo Core/Periphery keys in the Docker named volume `komodo_komodo_keys`. This is a known open migration item and is not fixed by the Borg source list alone. Target state: move the keys to a host path such as `/mnt/user/appdata/komodo/keys` and mount that path into both Komodo containers, then include it in Borg. Do not treat this as solved until the live Compose stack has been migrated and Periphery reconnect has been verified.
|
||||||
|
|
||||||
## Database Dumps Required
|
## Database Dumps Required
|
||||||
|
|
||||||
### Shared PostgreSQL (`postgresql17`, runtime PostgreSQL 18)
|
### Shared PostgreSQL (`postgresql17`, runtime PostgreSQL 18)
|
||||||
@@ -89,6 +101,7 @@ The live Unraid User Scripts execute repo scripts from `/mnt/user/services/homel
|
|||||||
|
|
||||||
- Komodo MongoDB
|
- Komodo MongoDB
|
||||||
- SQLite: `gitea`, `vaultwarden`, `speedtest-tracker`, `borg-ui`, `grafana`
|
- SQLite: `gitea`, `vaultwarden`, `speedtest-tracker`, `borg-ui`, `grafana`
|
||||||
|
- SQLite: `n8n` (`n8n.sqlite.dump`, credentials require the matching `N8N_ENCRYPTION_KEY`)
|
||||||
- File-backed state: `filebrowser.bolt.dump`
|
- File-backed state: `filebrowser.bolt.dump`
|
||||||
- Unraid flash config: `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256`
|
- Unraid flash config: `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256`
|
||||||
- Home Assistant native backups: created by HA under `/mnt/user/appdata/homeassistant/backups` and captured as file state
|
- Home Assistant native backups: created by HA under `/mnt/user/appdata/homeassistant/backups` and captured as file state
|
||||||
|
|||||||
@@ -18,6 +18,12 @@
|
|||||||
/local/appdata/borg-ui/data
|
/local/appdata/borg-ui/data
|
||||||
/local/appdata/komodo/periphery
|
/local/appdata/komodo/periphery
|
||||||
/local/appdata/komodo/core
|
/local/appdata/komodo/core
|
||||||
|
/local/appdata/nextcloud/html
|
||||||
|
/local/nextcloud/data
|
||||||
|
/local/appdata/n8n/data
|
||||||
|
/local/appdata/filebrowser
|
||||||
|
/local/appdata/influxdb3/data
|
||||||
|
/local/appdata/influxdb3/plugins
|
||||||
/local/services/homelab-infra
|
/local/services/homelab-infra
|
||||||
/local/services/smart-home-kalli
|
/local/services/smart-home-kalli
|
||||||
/local/services/stacks
|
/local/services/stacks
|
||||||
|
|||||||
@@ -325,6 +325,7 @@ main() {
|
|||||||
# Additional host-side SQLite dumps for admin tooling with appdata files.
|
# Additional host-side SQLite dumps for admin tooling with appdata files.
|
||||||
dump_sqlite_file "/mnt/user/appdata/borg-ui/data/borg.db" "$LATEST_DIR/borg-ui.sqlite" "borg-ui"
|
dump_sqlite_file "/mnt/user/appdata/borg-ui/data/borg.db" "$LATEST_DIR/borg-ui.sqlite" "borg-ui"
|
||||||
dump_sqlite_file "/var/lib/docker/volumes/monitoring_grafana_data/_data/grafana.db" "$LATEST_DIR/grafana.sqlite" "grafana"
|
dump_sqlite_file "/var/lib/docker/volumes/monitoring_grafana_data/_data/grafana.db" "$LATEST_DIR/grafana.sqlite" "grafana"
|
||||||
|
dump_sqlite_file "/mnt/user/appdata/n8n/data/database.sqlite" "$LATEST_DIR/n8n.sqlite.dump" "n8n"
|
||||||
|
|
||||||
# MongoDB
|
# MongoDB
|
||||||
dump_mongo_container "komodo-mongo" "$LATEST_DIR/komodo-mongo.archive.gz"
|
dump_mongo_container "komodo-mongo" "$LATEST_DIR/komodo-mongo.archive.gz"
|
||||||
|
|||||||
@@ -25,6 +25,7 @@ $Jobs = @(
|
|||||||
"immich.dump",
|
"immich.dump",
|
||||||
"komodo-mongo.archive.gz",
|
"komodo-mongo.archive.gz",
|
||||||
"mealie.dump",
|
"mealie.dump",
|
||||||
|
"n8n.sqlite.dump",
|
||||||
"nextcloud.dump",
|
"nextcloud.dump",
|
||||||
"postgresql17-authelia.dump",
|
"postgresql17-authelia.dump",
|
||||||
"postgresql17-globals.sql",
|
"postgresql17-globals.sql",
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ param(
|
|||||||
)
|
)
|
||||||
|
|
||||||
$checks = @(
|
$checks = @(
|
||||||
|
@{ Name = "postgresql17-globals.sql"; Path = Join-Path $DumpRoot "postgresql17-globals.sql" },
|
||||||
@{ Name = "postgresql17-paperless.dump"; Path = Join-Path $DumpRoot "postgresql17-paperless.dump" },
|
@{ Name = "postgresql17-paperless.dump"; Path = Join-Path $DumpRoot "postgresql17-paperless.dump" },
|
||||||
@{ Name = "postgresql17-mailarchiver.dump"; Path = Join-Path $DumpRoot "postgresql17-mailarchiver.dump" },
|
@{ Name = "postgresql17-mailarchiver.dump"; Path = Join-Path $DumpRoot "postgresql17-mailarchiver.dump" },
|
||||||
@{ Name = "mealie.dump"; Path = Join-Path $DumpRoot "mealie.dump" },
|
@{ Name = "mealie.dump"; Path = Join-Path $DumpRoot "mealie.dump" },
|
||||||
@@ -13,6 +14,7 @@ $checks = @(
|
|||||||
@{ Name = "nextcloud.dump"; Path = Join-Path $DumpRoot "nextcloud.dump" },
|
@{ Name = "nextcloud.dump"; Path = Join-Path $DumpRoot "nextcloud.dump" },
|
||||||
@{ Name = "gitea.sqlite.dump"; Path = Join-Path $DumpRoot "gitea.sqlite.dump" },
|
@{ Name = "gitea.sqlite.dump"; Path = Join-Path $DumpRoot "gitea.sqlite.dump" },
|
||||||
@{ Name = "vaultwarden.sqlite.dump"; Path = Join-Path $DumpRoot "vaultwarden.sqlite.dump" },
|
@{ Name = "vaultwarden.sqlite.dump"; Path = Join-Path $DumpRoot "vaultwarden.sqlite.dump" },
|
||||||
|
@{ Name = "n8n.sqlite.dump"; Path = Join-Path $DumpRoot "n8n.sqlite.dump" },
|
||||||
@{ Name = "speedtest-tracker.sqlite.dump"; Path = Join-Path $DumpRoot "speedtest-tracker.sqlite.dump" },
|
@{ Name = "speedtest-tracker.sqlite.dump"; Path = Join-Path $DumpRoot "speedtest-tracker.sqlite.dump" },
|
||||||
@{ Name = "filebrowser.bolt.dump"; Path = Join-Path $DumpRoot "filebrowser.bolt.dump" },
|
@{ Name = "filebrowser.bolt.dump"; Path = Join-Path $DumpRoot "filebrowser.bolt.dump" },
|
||||||
@{ Name = "unraid-flash-config.tar.gz"; Path = Join-Path $DumpRoot "unraid-flash-config.tar.gz" }
|
@{ Name = "unraid-flash-config.tar.gz"; Path = Join-Path $DumpRoot "unraid-flash-config.tar.gz" }
|
||||||
|
|||||||
@@ -89,6 +89,7 @@ check_pg_header() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
for dump in \
|
for dump in \
|
||||||
|
postgresql17-globals.sql \
|
||||||
postgresql17-paperless.dump \
|
postgresql17-paperless.dump \
|
||||||
postgresql17-mailarchiver.dump \
|
postgresql17-mailarchiver.dump \
|
||||||
mealie.dump \
|
mealie.dump \
|
||||||
@@ -96,6 +97,7 @@ for dump in \
|
|||||||
nextcloud.dump \
|
nextcloud.dump \
|
||||||
gitea.sqlite.dump \
|
gitea.sqlite.dump \
|
||||||
vaultwarden.sqlite.dump \
|
vaultwarden.sqlite.dump \
|
||||||
|
n8n.sqlite.dump \
|
||||||
speedtest-tracker.sqlite.dump \
|
speedtest-tracker.sqlite.dump \
|
||||||
filebrowser.bolt.dump \
|
filebrowser.bolt.dump \
|
||||||
unraid-flash-config.tar.gz; do
|
unraid-flash-config.tar.gz; do
|
||||||
|
|||||||
@@ -4,7 +4,11 @@ set -euo pipefail
|
|||||||
TEXTFILE_DIR="${TEXTFILE_DIR:-/mnt/user/services/posture-check/textfile}"
|
TEXTFILE_DIR="${TEXTFILE_DIR:-/mnt/user/services/posture-check/textfile}"
|
||||||
OUTPUT_FILE="${OUTPUT_FILE:-$TEXTFILE_DIR/homelab.prom}"
|
OUTPUT_FILE="${OUTPUT_FILE:-$TEXTFILE_DIR/homelab.prom}"
|
||||||
BORG_CONTAINER="${BORG_CONTAINER:-borg-ui}"
|
BORG_CONTAINER="${BORG_CONTAINER:-borg-ui}"
|
||||||
CRITICAL_CONTAINERS="${CRITICAL_CONTAINERS:-traefik authelia postgresql17 gitea komodo-core komodo-mongo komodo-periphery vaultwarden borg-ui ntfy adguard unbound monitoring-alertmanager monitoring-alertmanager-ntfy-bridge monitoring-blackbox-exporter monitoring-cadvisor monitoring-grafana monitoring-loki monitoring-node-exporter monitoring-promtail immich_server immich_postgres immich_redis paperless-ngx nextcloud nextcloud-postgres nextcloud-redis mealie mealie-postgres}"
|
BORG_EXPECTED_SOURCES_FILE="${BORG_EXPECTED_SOURCES_FILE:-/local/services/homelab-infra/ops/borg-ui/all-important-sources.txt}"
|
||||||
|
# Host-Pfad der aktuellen Dump-Artefakte (pre-backup-dumps.sh schreibt hierhin).
|
||||||
|
# Wird host-seitig gestattet; der Exporter laeuft als Unraid User Script.
|
||||||
|
BORG_DUMP_DIR="${BORG_DUMP_DIR:-/mnt/user/backups/borg/dumps/latest}"
|
||||||
|
CRITICAL_CONTAINERS="${CRITICAL_CONTAINERS:-traefik authelia postgresql17 gitea komodo-core komodo-mongo komodo-periphery vaultwarden borg-ui ntfy adguard unbound monitoring-alertmanager monitoring-alertmanager-ntfy-bridge monitoring-blackbox-exporter monitoring-cadvisor monitoring-grafana monitoring-loki monitoring-node-exporter monitoring-promtail immich_server immich_postgres immich_redis paperless-ngx nextcloud nextcloud-postgres nextcloud-redis mealie mealie-postgres mail-archiver n8n homeassistant smarthome-mosquitto}"
|
||||||
# Hinweis: Tailscale laeuft als natives Unraid-Plugin (kein Docker-Container) und
|
# Hinweis: Tailscale laeuft als natives Unraid-Plugin (kein Docker-Container) und
|
||||||
# wird daher hier bewusst NICHT als kritischer Container gefuehrt (Stand 2026-06-06).
|
# wird daher hier bewusst NICHT als kritischer Container gefuehrt (Stand 2026-06-06).
|
||||||
|
|
||||||
@@ -90,11 +94,32 @@ EOF
|
|||||||
# TYPE homelab_borg_last_success gauge
|
# TYPE homelab_borg_last_success gauge
|
||||||
# HELP homelab_borg_last_job_warning Whether the most recent Borg backup job completed with warnings.
|
# HELP homelab_borg_last_job_warning Whether the most recent Borg backup job completed with warnings.
|
||||||
# TYPE homelab_borg_last_job_warning gauge
|
# TYPE homelab_borg_last_job_warning gauge
|
||||||
|
# HELP homelab_borg_repository_last_check_timestamp_seconds Unix timestamp of the latest Borg repository check known to Borg UI.
|
||||||
|
# TYPE homelab_borg_repository_last_check_timestamp_seconds gauge
|
||||||
|
# HELP homelab_borg_scope_expected_file_present Whether the expected Borg source list file is visible inside Borg UI.
|
||||||
|
# TYPE homelab_borg_scope_expected_file_present gauge
|
||||||
|
# HELP homelab_borg_scope_expected_sources_total Number of expected Borg source paths from the repo source list.
|
||||||
|
# TYPE homelab_borg_scope_expected_sources_total gauge
|
||||||
|
# HELP homelab_borg_scope_configured_sources_total Number of Borg source paths configured in Borg UI.
|
||||||
|
# TYPE homelab_borg_scope_configured_sources_total gauge
|
||||||
|
# HELP homelab_borg_scope_missing_sources_total Number of expected Borg source paths missing from Borg UI.
|
||||||
|
# TYPE homelab_borg_scope_missing_sources_total gauge
|
||||||
|
# HELP homelab_borg_scope_extra_sources_total Number of Borg UI source paths not present in the repo source list.
|
||||||
|
# TYPE homelab_borg_scope_extra_sources_total gauge
|
||||||
|
# HELP homelab_borg_scope_source_configured Whether an expected Borg source path is configured in Borg UI.
|
||||||
|
# TYPE homelab_borg_scope_source_configured gauge
|
||||||
|
# HELP homelab_borg_schedule_prune_after_enabled Whether a Borg scheduled job runs prune after backup.
|
||||||
|
# TYPE homelab_borg_schedule_prune_after_enabled gauge
|
||||||
|
# HELP homelab_borg_schedule_compact_after_enabled Whether a Borg scheduled job runs compact after backup.
|
||||||
|
# TYPE homelab_borg_schedule_compact_after_enabled gauge
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
if docker inspect "$BORG_CONTAINER" >/dev/null 2>&1; then
|
if docker inspect "$BORG_CONTAINER" >/dev/null 2>&1; then
|
||||||
docker exec -i "$BORG_CONTAINER" python3 - <<'PY'
|
docker exec -i -e BORG_EXPECTED_SOURCES_FILE="$BORG_EXPECTED_SOURCES_FILE" "$BORG_CONTAINER" python3 - <<'PY'
|
||||||
import datetime as dt
|
import datetime as dt
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
import sqlite3
|
import sqlite3
|
||||||
|
|
||||||
conn = sqlite3.connect("/data/borg.db")
|
conn = sqlite3.connect("/data/borg.db")
|
||||||
@@ -135,6 +160,9 @@ def parse_ts(value):
|
|||||||
def escape_label(value):
|
def escape_label(value):
|
||||||
return (value or "").replace("\\", "\\\\").replace('"', '\\"')
|
return (value or "").replace("\\", "\\\\").replace('"', '\\"')
|
||||||
|
|
||||||
|
def bool_metric(value):
|
||||||
|
return 1 if value else 0
|
||||||
|
|
||||||
latest_status = latest["status"] if latest else "missing"
|
latest_status = latest["status"] if latest else "missing"
|
||||||
latest_success = 1 if latest_status in ("completed", "completed_with_warnings") else 0
|
latest_success = 1 if latest_status in ("completed", "completed_with_warnings") else 0
|
||||||
latest_warning = 1 if latest_status == "completed_with_warnings" else 0
|
latest_warning = 1 if latest_status == "completed_with_warnings" else 0
|
||||||
@@ -145,12 +173,107 @@ completed_archive = escape_label(completed["archive_name"] if completed else "")
|
|||||||
print(f'homelab_borg_last_success{{status="{latest_status}",archive="{latest_archive}"}} {latest_success}')
|
print(f'homelab_borg_last_success{{status="{latest_status}",archive="{latest_archive}"}} {latest_success}')
|
||||||
print(f'homelab_borg_last_job_warning{{status="{latest_status}",archive="{latest_archive}"}} {latest_warning}')
|
print(f'homelab_borg_last_job_warning{{status="{latest_status}",archive="{latest_archive}"}} {latest_warning}')
|
||||||
print(f'homelab_borg_last_completed_timestamp_seconds{{archive="{completed_archive}"}} {completed_ts}')
|
print(f'homelab_borg_last_completed_timestamp_seconds{{archive="{completed_archive}"}} {completed_ts}')
|
||||||
|
|
||||||
|
repo = cur.execute("""
|
||||||
|
select id, name, source_directories, last_check
|
||||||
|
from repositories
|
||||||
|
order by id
|
||||||
|
limit 1
|
||||||
|
""").fetchone()
|
||||||
|
|
||||||
|
if repo:
|
||||||
|
repo_name = escape_label(repo["name"] or str(repo["id"]))
|
||||||
|
print(f'homelab_borg_repository_last_check_timestamp_seconds{{repository="{repo_name}"}} {parse_ts(repo["last_check"])}')
|
||||||
|
|
||||||
|
try:
|
||||||
|
configured_sources = json.loads(repo["source_directories"] or "[]")
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
configured_sources = []
|
||||||
|
else:
|
||||||
|
configured_sources = []
|
||||||
|
|
||||||
|
expected_path = Path(os.environ.get("BORG_EXPECTED_SOURCES_FILE", ""))
|
||||||
|
expected_file_present = expected_path.is_file()
|
||||||
|
if expected_file_present:
|
||||||
|
expected_sources = [
|
||||||
|
line.strip()
|
||||||
|
for line in expected_path.read_text(encoding="utf-8").splitlines()
|
||||||
|
if line.strip() and not line.lstrip().startswith("#")
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
expected_sources = []
|
||||||
|
|
||||||
|
configured_set = set(configured_sources)
|
||||||
|
expected_set = set(expected_sources)
|
||||||
|
missing_sources = [source for source in expected_sources if source not in configured_set]
|
||||||
|
extra_sources = [source for source in configured_sources if source not in expected_set]
|
||||||
|
|
||||||
|
print(f"homelab_borg_scope_expected_file_present {bool_metric(expected_file_present)}")
|
||||||
|
print(f"homelab_borg_scope_expected_sources_total {len(expected_sources)}")
|
||||||
|
print(f"homelab_borg_scope_configured_sources_total {len(configured_sources)}")
|
||||||
|
print(f"homelab_borg_scope_missing_sources_total {len(missing_sources)}")
|
||||||
|
print(f"homelab_borg_scope_extra_sources_total {len(extra_sources)}")
|
||||||
|
|
||||||
|
for source in expected_sources:
|
||||||
|
value = 1 if source in configured_set else 0
|
||||||
|
print(f'homelab_borg_scope_source_configured{{source="{escape_label(source)}"}} {value}')
|
||||||
|
|
||||||
|
for source in extra_sources:
|
||||||
|
print(f'homelab_borg_scope_source_configured{{source="{escape_label(source)}",state="extra"}} 0')
|
||||||
|
|
||||||
|
for schedule in cur.execute("""
|
||||||
|
select id, name, run_prune_after, run_compact_after
|
||||||
|
from scheduled_jobs
|
||||||
|
where enabled = 1
|
||||||
|
order by id
|
||||||
|
"""):
|
||||||
|
schedule_name = escape_label(schedule["name"] or str(schedule["id"]))
|
||||||
|
print(f'homelab_borg_schedule_prune_after_enabled{{schedule="{schedule_name}"}} {bool_metric(schedule["run_prune_after"])}')
|
||||||
|
print(f'homelab_borg_schedule_compact_after_enabled{{schedule="{schedule_name}"}} {bool_metric(schedule["run_compact_after"])}')
|
||||||
PY
|
PY
|
||||||
else
|
else
|
||||||
printf 'homelab_borg_last_success{status="container_missing",archive=""} 0\n'
|
printf 'homelab_borg_last_success{status="container_missing",archive=""} 0\n'
|
||||||
printf 'homelab_borg_last_job_warning{status="container_missing",archive=""} 0\n'
|
printf 'homelab_borg_last_job_warning{status="container_missing",archive=""} 0\n'
|
||||||
printf 'homelab_borg_last_completed_timestamp_seconds{archive=""} 0\n'
|
printf 'homelab_borg_last_completed_timestamp_seconds{archive=""} 0\n'
|
||||||
|
printf 'homelab_borg_repository_last_check_timestamp_seconds{repository=""} 0\n'
|
||||||
|
printf 'homelab_borg_scope_expected_file_present 0\n'
|
||||||
|
printf 'homelab_borg_scope_expected_sources_total 0\n'
|
||||||
|
printf 'homelab_borg_scope_configured_sources_total 0\n'
|
||||||
|
printf 'homelab_borg_scope_missing_sources_total 0\n'
|
||||||
|
printf 'homelab_borg_scope_extra_sources_total 0\n'
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# Dump-Frische host-seitig messen. Schliesst den Blindfleck, dass Borg
|
||||||
|
# weiterlaeuft und stale Dumps archiviert, ohne dass ein Job-Fehler entsteht
|
||||||
|
# (pre-backup-dumps.sh gestoppt). Laeuft ausserhalb des borg-ui-Containers,
|
||||||
|
# weil die Dumps host-seitig unter $BORG_DUMP_DIR liegen.
|
||||||
|
cat <<'EOF'
|
||||||
|
# HELP homelab_borg_dump_present Whether an expected Borg pre-backup dump artifact exists in the latest dump set.
|
||||||
|
# TYPE homelab_borg_dump_present gauge
|
||||||
|
# HELP homelab_borg_dump_age_seconds Age in seconds of an expected Borg pre-backup dump artifact.
|
||||||
|
# TYPE homelab_borg_dump_age_seconds gauge
|
||||||
|
EOF
|
||||||
|
for dump in \
|
||||||
|
postgresql17-globals.sql \
|
||||||
|
postgresql17-mailarchiver.dump \
|
||||||
|
postgresql17-paperless.dump \
|
||||||
|
mealie.dump \
|
||||||
|
immich.dump \
|
||||||
|
nextcloud.dump \
|
||||||
|
gitea.sqlite.dump \
|
||||||
|
vaultwarden.sqlite.dump \
|
||||||
|
n8n.sqlite.dump \
|
||||||
|
unraid-flash-config.tar.gz \
|
||||||
|
komodo-mongo.archive.gz; do
|
||||||
|
dump_path="$BORG_DUMP_DIR/$dump"
|
||||||
|
if [ -f "$dump_path" ]; then
|
||||||
|
dump_mtime="$(stat -c %Y "$dump_path" 2>/dev/null || echo 0)"
|
||||||
|
printf 'homelab_borg_dump_present{dump="%s"} 1\n' "$dump"
|
||||||
|
printf 'homelab_borg_dump_age_seconds{dump="%s"} %s\n' "$dump" "$(( now - dump_mtime ))"
|
||||||
|
else
|
||||||
|
printf 'homelab_borg_dump_present{dump="%s"} 0\n' "$dump"
|
||||||
|
fi
|
||||||
|
done
|
||||||
} > "$tmp"
|
} > "$tmp"
|
||||||
|
|
||||||
# 0644 statt mktemp-default 0600, damit der node-exporter-Textfile-Collector
|
# 0644 statt mktemp-default 0600, damit der node-exporter-Textfile-Collector
|
||||||
|
|||||||
Reference in New Issue
Block a user