6 Commits

Author SHA1 Message Date
renovate dbc344d362 chore(deps): update minor-and-patch-updates 2026-06-19 10:20:48 +00:00
Micha c39ae5cdfa monitoring: Prometheus-Directory-Mount-Hardening als erledigt vermerkt
Prometheus laeuft jetzt mit stabilem Directory-Mount (recreated, 25 Regeln
aktiv). Verbleibendes Einzeldatei-Muster bei den uebrigen Monitoring-Services
als Folge-Item praezisiert.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 10:24:45 +02:00
Micha 80385c4560 monitoring: Prometheus-Config als Verzeichnis-Mount (FUSE-Stale-Handle-Fix)
Einzeldatei-Bind-Mounts von alerts.yml/prometheus.yml brechen auf dem
Unraid-FUSE-Share bei git/Komodo-Updates zu "Stale NFS file handle"
(Inode-Wechsel) -> Config-Reload laedt 0 Regeln, nur --force-recreate heilt.
Umgestellt auf stabilen Directory-Mount ./prometheus:/etc/prometheus/config:ro
plus angepasste --config.file und rule_files. Kuenftig reicht ein Reload.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 10:20:35 +02:00
Micha 7587ee4e77 Backup-Hardening: Live-Verifikation + Monitoring-Bind-Mount-Befund
Dump-Alerts live verifiziert (25 Regeln geladen, 0 feuern, Scope-Drift 0).
Stale-Handle der Prometheus-alerts.yml (FUSE-Einzeldatei-Mount) per
--force-recreate behoben; Directory-Mount-Hardening als TODO aufgenommen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 10:17:58 +02:00
Micha a3c5610934 Backup-Hardening: TODO-Status an verifizierten Live-Stand angleichen
Host-Clone-Pull und Borg-UI-Scope sind erledigt/verifiziert (Live-Drift 0,
alle 33 Quellen konfiguriert). Offen bleiben nur der Prometheus-Config-Reload
fuer die neuen Dump-Alerts und der Nearline-Dead-Man's-Switch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 08:08:04 +02:00
Micha bc9ace315a Backup-Audit-Hardening: Dump-Frische-Monitoring und Scope-Konsistenz
Findings aus dem Backup-/Restore-Audit 2026-06-18 umgesetzt:

- Dump-Frische als Prometheus-Metrik (homelab_borg_dump_present /
  homelab_borg_dump_age_seconds) im Host-Exporter; schliesst den
  Blindfleck, dass Borg weiterlaeuft und stale Dumps archiviert, ohne
  Job-Fehler.
- Neue Alerts HomelabBorgDumpMissing / HomelabBorgDumpStale (critical)
  plus ALERT_RULES.md.
- Freshness-Gate (.sh + .ps1) und H:-Nearline-Pull um n8n.sqlite.dump
  und postgresql17-globals.sql ergaenzt.
- Critical-Container-Watch um mail-archiver, n8n, homeassistant,
  smarthome-mosquitto erweitert.
- BACKUP_SCOPE: /mnt/user/projekte und sonstige User-Shares ausserhalb
  App-Scope als bewusste offene Operator-Entscheidung dokumentiert;
  Hermes-data-Pfad als geparkt klargestellt.
- MASTER_TODO: Nearline-Pull-Ueberwachung, Host-Pull-Nachzug und
  projekte-Scope-Entscheidung aufgenommen.

Enthaelt ausserdem die zuvor vorbereiteten Scope-Erweiterungen
(nextcloud html+data, n8n, filebrowser, influxdb3) und Scope-Drift-/
Retention-/Compact-/Check-Alerts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 20:25:54 +02:00
22 changed files with 263 additions and 24 deletions
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
mail-archiver: mail-archiver:
image: s1t5/mailarchiver@sha256:4ea7ecc47ad1dd2c523b85c3967574b61e39def1b6fd26edf874e21733c4018c image: s1t5/mailarchiver@sha256:9860b170040dc096aeedc47bdbb09e2913aa8742eec205e29edd0ab79f9dda7e
container_name: mail-archiver container_name: mail-archiver
restart: unless-stopped restart: unless-stopped
environment: environment:
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
super-productivity: super-productivity:
image: johannesjo/super-productivity:v18.9.1@sha256:773760107344e739f4c29409f7842db66a1b167d50eb2c40248cb5b5b328652e image: johannesjo/super-productivity:v18.11.0@sha256:bbf510b159262a00ab133818909c2234c7fc71bc59907ab58a810a6d3c15cc43
container_name: super-productivity container_name: super-productivity
restart: unless-stopped restart: unless-stopped
+9 -1
View File
@@ -1,6 +1,6 @@
# Alert Rules # Alert Rules
Stand: 2026-06-05 Stand: 2026-06-18
Diese Datei beschreibt die produktiven Alarmwege und wichtigsten Regeln. Die Diese Datei beschreibt die produktiven Alarmwege und wichtigsten Regeln. Die
Konfiguration selbst liegt in `monitoring/prometheus/alerts.yml` und in den Konfiguration selbst liegt in `monitoring/prometheus/alerts.yml` und in den
@@ -36,6 +36,14 @@ Skripten unter `services/posture-check/`.
| `HomelabBorgBackupStale` | letztes Borg-Backup >30h | warning | Backup-Lauf nachholen/pruefen | | `HomelabBorgBackupStale` | letztes Borg-Backup >30h | warning | Backup-Lauf nachholen/pruefen |
| `HomelabBorgLastJobFailed` | letzter Borg-Job fehlgeschlagen | critical | Borg-UI-Job-Log pruefen | | `HomelabBorgLastJobFailed` | letzter Borg-Job fehlgeschlagen | critical | Borg-UI-Job-Log pruefen |
| `HomelabBorgLastJobCompletedWithWarnings` | letzter Borg-Job mit Warnungen | warning | Warnung im Borg-UI-Job lesen | | `HomelabBorgLastJobCompletedWithWarnings` | letzter Borg-Job mit Warnungen | warning | Warnung im Borg-UI-Job lesen |
| `HomelabBorgDumpMissing` | erwartetes Dump-Artefakt fehlt im aktuellen Dump-Set | critical | `pre-backup-dumps.sh`/User-Script pruefen |
| `HomelabBorgDumpStale` | Dump-Artefakt >30h alt (Borg laeuft, Dumps eingefroren) | critical | `pre-backup-dumps.sh`/User-Script pruefen, nicht nur den Borg-Job |
| `HomelabBorgScopeSourceListMissing` | Repo-Quellliste fuer Borg-Drift-Check fehlt | critical | Borg-UI-Mount `/local/services/homelab-infra` und Repo-Pfad pruefen |
| `HomelabBorgScopeMissingSources` | Borg UI enthaelt nicht alle Pfade aus `ops/borg-ui/all-important-sources.txt` | critical | Live-Borg-Scope an Repo-Quelle angleichen |
| `HomelabBorgScopeExtraSources` | Borg UI enthaelt Pfade ausserhalb der Repo-Quellliste | warning | Doku oder Live-Scope bereinigen |
| `HomelabBorgRepositoryCheckStale` | letzter Borg-Check >14 Tage alt | warning | Borg-Repository-Check ausfuehren oder Scheduler pruefen |
| `HomelabBorgRetentionDisabled` | Scheduled Job fuehrt kein Prune aus | warning | Retention-Einstellung in Borg UI pruefen |
| `HomelabBorgCompactDisabled` | Scheduled Job fuehrt kein Compact aus | warning | Compact-Einstellung in Borg UI pruefen |
| `HomelabCriticalContainerDown` | kritischer Container fehlt | critical | Komodo/Docker-Status pruefen | | `HomelabCriticalContainerDown` | kritischer Container fehlt | critical | Komodo/Docker-Status pruefen |
| `HomelabPrometheusTargetDown` | Scrape-Ziel down | critical | node-exporter/cadvisor/blackbox/traefik pruefen | | `HomelabPrometheusTargetDown` | Scrape-Ziel down | critical | node-exporter/cadvisor/blackbox/traefik pruefen |
+10 -4
View File
@@ -1,6 +1,6 @@
# Master To-do - KalliLab CORE # Master To-do - KalliLab CORE
Typ: Status/To-do · Stand: 2026-06-17 · Status: aktiv Typ: Status/To-do · Stand: 2026-06-18 · Status: aktiv
Diese Liste ist die **einzige** Arbeitsliste fuer offene operative Punkte im Diese Liste ist die **einzige** Arbeitsliste fuer offene operative Punkte im
Homelab. Detailablaeufe stehen in den verlinkten Runbooks; Entscheidungen mit Homelab. Detailablaeufe stehen in den verlinkten Runbooks; Entscheidungen mit
@@ -25,14 +25,19 @@ Host-Reports (`/mnt/user/backups/restore-reports/`) und in der Git-Historie.
| Restore-Test Tailscale | Operator | State-Validierung + Reconnect nur auf Wegwerf-Host/VM, danach Geraet in Tailscale-Admin entfernen | `ops/restore-tests/tailscale-runbook.md` | | Restore-Test Tailscale | Operator | State-Validierung + Reconnect nur auf Wegwerf-Host/VM, danach Geraet in Tailscale-Admin entfernen | `ops/restore-tests/tailscale-runbook.md` |
| Authelia OIDC fuer Apps | Operator/Codex | Live: Grafana + Mealie login-verifiziert; Paperless Secret verdrahtet und Service-Smoke am 2026-06-17 gruen, finaler Browser-Login mit Operator-Account offen. Immich + Nextcloud bewusst geparkt bis Family-Onboarding (siehe `docs/DECISIONS.md` 2026-06-06) | `docs/AUTHELIA_OIDC_PLAN.md` | | Authelia OIDC fuer Apps | Operator/Codex | Live: Grafana + Mealie login-verifiziert; Paperless Secret verdrahtet und Service-Smoke am 2026-06-17 gruen, finaler Browser-Login mit Operator-Account offen. Immich + Nextcloud bewusst geparkt bis Family-Onboarding (siehe `docs/DECISIONS.md` 2026-06-06) | `docs/AUTHELIA_OIDC_PLAN.md` |
| Home Assistant Tibber | Operator/Codex | Tibber per HA-UI-Config-Flow verbinden. Danach Energy-Dashboard um echte Kosten/Preisquelle ergaenzen; SolarEdge-PV, Netz und Speicher sind bereits konfiguriert und validiert | `docs/runbooks/smart-home-bootstrap.md`, `docs/DECISIONS.md` | | Home Assistant Tibber | Operator/Codex | Tibber per HA-UI-Config-Flow verbinden. Danach Energy-Dashboard um echte Kosten/Preisquelle ergaenzen; SolarEdge-PV, Netz und Speicher sind bereits konfiguriert und validiert | `docs/runbooks/smart-home-bootstrap.md`, `docs/DECISIONS.md` |
| Nearline-Pull Dead-Man's-Switch | Operator | H:-Pull war ~2026-06-04 bis 2026-06-18 still gestoppt (Task fehlte, kein Alarm). Lauf nachgeholt + Scheduled Task `KalliLab H Drive Nearline Pull` neu registriert (State Ready). **Verbleibt:** externer Dead-Man's-Switch (Healthchecks.io-Ping am Ende von `pull-critical-backups.ps1` und `ops/borg-ui/scripts/pre-borg.sh`), da Prometheus auf Unraid den baerchen-Pull nicht sieht | `ops/h-drive-nearline/README.md` |
| Monitoring Single-File-Bind-Mount Hardening | Operator/Claude | Prometheus am 2026-06-19 auf stabilen Directory-Mount (`./prometheus:/etc/prometheus/config:ro`) umgestellt + recreated -> Stale-Handle-Footgun dort beseitigt, Reload reicht wieder. **Verbleibt:** gleiches Einzeldatei-Muster bei alertmanager/blackbox/loki/promtail/grafana-provisioning praeventiv auf Directory-Mount umstellen | `monitoring/docker-compose.yml` |
--- ---
## Operator-Entscheidung ## Operator-Entscheidung
**Stand 2026-06-11: keine offenen Operator-Entscheidungen.**
Getroffene Entscheidungen mit Begruendung und Review-Trigger: `docs/DECISIONS.md`. Getroffene Entscheidungen mit Begruendung und Review-Trigger: `docs/DECISIONS.md`.
| Thema | Entscheidung noetig | Quelle |
|---|---|---|
| `/mnt/user/projekte` Backup-Scope | Filebrowser serviert `projekte` (und ganze `documents`/`photos`), aber nur App-Unterordner sind im Borg-Scope. Entscheiden: `projekte` als read-only Borg-UI-Mount + Quelllisten-Eintrag aufnehmen, oder bewusst als "nur lokal, nicht DR-relevant" bestaetigen | `ops/borg-ui/BACKUP_SCOPE.md` Abschnitt "User-Daten-Shares ausserhalb des App-Scope" |
--- ---
## Geparkt ## Geparkt
@@ -52,6 +57,7 @@ Bewusst nicht jetzt - Begruendungen in `docs/DECISIONS.md`, hier nur Thema und T
| Filebrowser-Mount-Scope | naechster Hardening-Sprint | `docs/SERVICE_CATALOG.md` | | Filebrowser-Mount-Scope | naechster Hardening-Sprint | `docs/SERVICE_CATALOG.md` |
| Scrutiny Privileged-Ausnahme | nur mit klarer Begruendung aendern | `docs/SERVICE_CATALOG.md` | | Scrutiny Privileged-Ausnahme | nur mit klarer Begruendung aendern | `docs/SERVICE_CATALOG.md` |
| Immich Redis named volume | passende Wartung am Immich-Stack | `docs/SERVICE_CATALOG.md` | | Immich Redis named volume | passende Wartung am Immich-Stack | `docs/SERVICE_CATALOG.md` |
| Komodo keys named volume | gemeinsames Wartungsfenster mit Operator | Live-Volume `komodo_komodo_keys` nach `/mnt/user/appdata/komodo/keys` migrieren, Compose anpassen, Periphery-Reconnect pruefen, dann in Borg-Scope aufnehmen |
| Storage-Wachstum (zweite NVMe, zweite Array-Disk, ZFS/BTRFS) | Trigger aus Capacity-Doku | `docs/STORAGE_LAYOUT.md`, `docs/CAPACITY_AND_LIFECYCLE.md` | | Storage-Wachstum (zweite NVMe, zweite Array-Disk, ZFS/BTRFS) | Trigger aus Capacity-Doku | `docs/STORAGE_LAYOUT.md`, `docs/CAPACITY_AND_LIFECYCLE.md` |
| Wiederkehrende Restore-Drills | laufend nach Kadenz, inkl. quartalsweisem Frische-Negativtest (`run-restore-checks.sh freshness-negative`) | `docs/RESTORE_MATRIX.md`, `ops/restore-tests/schedule.md` | | Wiederkehrende Restore-Drills | laufend nach Kadenz, inkl. quartalsweisem Frische-Negativtest (`run-restore-checks.sh freshness-negative`) | `docs/RESTORE_MATRIX.md`, `ops/restore-tests/schedule.md` |
| Doku-Quartals-Gaertnern (~15 min) | quartalsweise, erster Lauf mit Q3-Review ab 2026-07-01: Datiertes archivieren, Done-/Review-Logs kuerzen, tote Links pruefen | `docs/REPO_MAP.md` Doku-Regeln | | Doku-Quartals-Gaertnern (~15 min) | quartalsweise, erster Lauf mit Q3-Review ab 2026-07-01: Datiertes archivieren, Done-/Review-Logs kuerzen, tote Links pruefen | `docs/REPO_MAP.md` Doku-Regeln |
@@ -71,8 +77,8 @@ Bewusst nicht jetzt - Begruendungen in `docs/DECISIONS.md`, hier nur Thema und T
- **2026-06-17** Offene TODOs gegen Live-Stand abgeglichen: Paperless-OIDC-Secret verdrahtet und Service-Smoke gruen; alter Tailscale-Docker-State nach `_archive/tailscale-removed-2026-06-06/` verschoben; Tailnet-Restpunkt geschlossen. - **2026-06-17** Offene TODOs gegen Live-Stand abgeglichen: Paperless-OIDC-Secret verdrahtet und Service-Smoke gruen; alter Tailscale-Docker-State nach `_archive/tailscale-removed-2026-06-06/` verschoben; Tailnet-Restpunkt geschlossen.
- **2026-06-17** Repo-Hygiene abgeschlossen: Glance-Widget-Tokens sind in Runtime gesetzt, Audit-PDF liegt extern unter `H:\kallilab-recovery\audits`, Worktree clean. - **2026-06-17** Repo-Hygiene abgeschlossen: Glance-Widget-Tokens sind in Runtime gesetzt, Audit-PDF liegt extern unter `H:\kallilab-recovery\audits`, Worktree clean.
- **2026-06-17** Komodo/Gitea-Webhooks normalisiert: aktive Komodo-Hooks fuer `Micha/homelab-infra` nutzen Branch-Filter `master`; DB-Backup vor Host-Hotfix erstellt. Workflow-Regel nachgezogen. - **2026-06-17** Komodo/Gitea-Webhooks normalisiert: aktive Komodo-Hooks fuer `Micha/homelab-infra` nutzen Branch-Filter `master`; DB-Backup vor Host-Hotfix erstellt. Workflow-Regel nachgezogen.
- **2026-06-13** Home Assistant MQTT-Integration produktiv verbunden: Config-Entry `smarthome-mosquitto` ist `loaded`, Mosquitto sieht den HA-Client `homeassistant`; `check_config` gruen. - **2026-06-19** Backup-Hardening live verifiziert: Borg-Scope-Drift 0 (alle 33 Quellen konfiguriert), Dumps frisch (11/11 present), neue Dump-Alerts aktiv (25 Regeln, 0 feuern). Prometheus-`alerts.yml`-Stale-Handle (FUSE-Einzeldatei-Mount) per `--force-recreate` behoben und anschliessend dauerhaft auf Directory-Mount umgestellt (recreated, 25 Regeln aktiv).
- **2026-06-13** HA Energy Dashboard konfiguriert: Netz, PV und Speicher aus SolarEdge Local gesetzt, `energy/validate` ohne Issues; HA-Backup danach erzeugt. - **2026-06-18** Backup-Audit-Hardening: Dump-Frische-Metriken + Alerts `HomelabBorgDumpMissing/Stale`, Freshness-Checks + Nearline-Pull um `n8n`/`globals` ergaenzt, 4 Tier-2-Container in Critical-Watch, Scope-Doku fuer `projekte`/Hermes praezisiert. H:-Nearline (still seit 2026-06-04) nachgeholt + Task neu registriert.
--- ---
+2
View File
@@ -60,6 +60,7 @@ Sie ist die fachliche Ergaenzung zu `docs/DISASTER_RECOVERY.md`.
| Glance | Git / Borg-Repo | Repo-Konfiguration unter `ops/glance/config/glance.yml`; keine kritische Datenpersistenz | keine | `GLANCE_IMMICH_API_KEY`, `GLANCE_ADGUARD_USERNAME`, `GLANCE_ADGUARD_PASSWORD`, `GLANCE_SPEEDTEST_API_KEY` | Traefik, Authelia, optional interne API-Ziele | Dashboard startet, Widgets laden, Docker-Status laeuft nur ueber `glance-docker-socket-proxy` | | Glance | Git / Borg-Repo | Repo-Konfiguration unter `ops/glance/config/glance.yml`; keine kritische Datenpersistenz | keine | `GLANCE_IMMICH_API_KEY`, `GLANCE_ADGUARD_USERNAME`, `GLANCE_ADGUARD_PASSWORD`, `GLANCE_SPEEDTEST_API_KEY` | Traefik, Authelia, optional interne API-Ziele | Dashboard startet, Widgets laden, Docker-Status laeuft nur ueber `glance-docker-socket-proxy` |
| ntfy | Borg / Share | `/mnt/user/appdata/ntfy` | keine | keine besonderen Secret-Dateien dokumentiert | Traefik | UI und Push-Endpunkt erreichbar | | ntfy | Borg / Share | `/mnt/user/appdata/ntfy` | keine | keine besonderen Secret-Dateien dokumentiert | Traefik | UI und Push-Endpunkt erreichbar |
| Paperless-GPT | Borg / Share | `/mnt/user/appdata/paperless-gpt` | keine eigene DB | `PAPERLESS_API_TOKEN`, `OPENAI_API_KEY` | Traefik, Paperless, OpenAI API | UI startet, Konfiguration vorhanden; LLM-Provider zeigt `openai` / `gpt-5.4-mini` | | Paperless-GPT | Borg / Share | `/mnt/user/appdata/paperless-gpt` | keine eigene DB | `PAPERLESS_API_TOKEN`, `OPENAI_API_KEY` | Traefik, Paperless, OpenAI API | UI startet, Konfiguration vorhanden; LLM-Provider zeigt `openai` / `gpt-5.4-mini` |
| n8n | Borg + Dump | `/mnt/user/appdata/n8n/data` | `n8n.sqlite.dump`; Credentials sind nur mit dem passenden `N8N_ENCRYPTION_KEY` entschluesselbar | `N8N_ENCRYPTION_KEY`, GMX/OpenAI/Gitea-Credentials in n8n | Traefik, GMX IMAP, OpenAI API, Gitea API | UI startet, Owner-Login funktioniert, kritischer Mail->LLM->Gitea-Workflow ist vorhanden und deaktiviert/aktiv wie vor Restore |
| Home Assistant | Borg + HA-native Backups + Fachrepo | `/mnt/user/appdata/homeassistant` inkl. `.storage`, `secrets.yaml`, `trusted_proxies.yaml`, `custom_components` (HACS, `solaredge_modbus_multi`); Fach-YAML aus `/mnt/user/services/smart-home-kalli/home-assistant` | HA-native Backup-Artefakte unter `/mnt/user/appdata/homeassistant/backups`; erstes Artefakt 2026-06-13 erzeugt und tar-lesbar (`backup.json`, `homeassistant.tar.gz`); Backup nach SolarEdge-Integration: `Custom_backup_2026.6.1_2026-06-13_14.59_48645373.tar`; Backup nach Energy-Dashboard-Konfiguration: `Custom_backup_2026.6.1_2026-06-13_15.59_25670583.tar`; keine externe DB in Phase 1 | HA-Secrets in `secrets.yaml`, Integrations-Tokens in `.storage`, MQTT-Credentials, Agent-API-Tokens als Host-Secrets `ha_token_codex`/`ha_token_claude` (nur mit erhaltenem `.storage`-Auth-State nutzbar), spaeter Tibber/InfluxDB-Tokens | Traefik, `frontend_net`, `smarthome_net`, Mosquitto, Fachrepo-Clone, SolarEdge-Wechselrichter `192.168.178.111:1502` | Restore-Test am 2026-06-13 erfolgreich: HA-native Backup + Mosquitto-Appdata + Fachrepo-Clone isoliert gestartet, HA HTTP/API/check_config gruen; produktiv danach HA-MQTT-Config-Entry `smarthome-mosquitto` geladen, SolarEdge Local `solaredge_modbus_multi` loaded mit 68 Entitaeten und Energy Dashboard fuer Netz/PV/Speicher per `energy/validate` ohne Issues; Report `/mnt/user/backups/restore-reports/homeassistant-2026-06-13.md` | | Home Assistant | Borg + HA-native Backups + Fachrepo | `/mnt/user/appdata/homeassistant` inkl. `.storage`, `secrets.yaml`, `trusted_proxies.yaml`, `custom_components` (HACS, `solaredge_modbus_multi`); Fach-YAML aus `/mnt/user/services/smart-home-kalli/home-assistant` | HA-native Backup-Artefakte unter `/mnt/user/appdata/homeassistant/backups`; erstes Artefakt 2026-06-13 erzeugt und tar-lesbar (`backup.json`, `homeassistant.tar.gz`); Backup nach SolarEdge-Integration: `Custom_backup_2026.6.1_2026-06-13_14.59_48645373.tar`; Backup nach Energy-Dashboard-Konfiguration: `Custom_backup_2026.6.1_2026-06-13_15.59_25670583.tar`; keine externe DB in Phase 1 | HA-Secrets in `secrets.yaml`, Integrations-Tokens in `.storage`, MQTT-Credentials, Agent-API-Tokens als Host-Secrets `ha_token_codex`/`ha_token_claude` (nur mit erhaltenem `.storage`-Auth-State nutzbar), spaeter Tibber/InfluxDB-Tokens | Traefik, `frontend_net`, `smarthome_net`, Mosquitto, Fachrepo-Clone, SolarEdge-Wechselrichter `192.168.178.111:1502` | Restore-Test am 2026-06-13 erfolgreich: HA-native Backup + Mosquitto-Appdata + Fachrepo-Clone isoliert gestartet, HA HTTP/API/check_config gruen; produktiv danach HA-MQTT-Config-Entry `smarthome-mosquitto` geladen, SolarEdge Local `solaredge_modbus_multi` loaded mit 68 Entitaeten und Energy Dashboard fuer Netz/PV/Speicher per `energy/validate` ohne Issues; Report `/mnt/user/backups/restore-reports/homeassistant-2026-06-13.md` |
| Smart-Home MQTT / Mosquitto | Borg / Share | `/mnt/user/appdata/mosquitto/config`, `/mnt/user/appdata/mosquitto/data`, `/mnt/user/appdata/mosquitto/log` | Mosquitto persistiert retained messages/subscriptions dateibasiert | `passwordfile`, `aclfile`, spaeter per-Device-User | `smarthome_net`, Home Assistant, spaeter ESPHome/Zigbee2MQTT | Restore-Test am 2026-06-13 erfolgreich: authentifizierter Publish/Subscribe-Smoke mit `homeassistant`-User und retained Topic nach Broker-Restart gruen; produktiv verbindet sich HA als User `homeassistant` | | Smart-Home MQTT / Mosquitto | Borg / Share | `/mnt/user/appdata/mosquitto/config`, `/mnt/user/appdata/mosquitto/data`, `/mnt/user/appdata/mosquitto/log` | Mosquitto persistiert retained messages/subscriptions dateibasiert | `passwordfile`, `aclfile`, spaeter per-Device-User | `smarthome_net`, Home Assistant, spaeter ESPHome/Zigbee2MQTT | Restore-Test am 2026-06-13 erfolgreich: authentifizierter Publish/Subscribe-Smoke mit `homeassistant`-User und retained Topic nach Broker-Restart gruen; produktiv verbindet sich HA als User `homeassistant` |
| Smart-Home Fachrepo | Gitea + Borg-Repo-Clone | `/mnt/user/services/smart-home-kalli` | keine | keine echten Secrets im Repo; `secrets-template/` nur Beispiele | Gitea, Home Assistant Mounts | `git status` sauber, HA liest `configuration.yaml` und `packages/` aus dem Clone | | Smart-Home Fachrepo | Gitea + Borg-Repo-Clone | `/mnt/user/services/smart-home-kalli` | keine | keine echten Secrets im Repo; `secrets-template/` nur Beispiele | Gitea, Home Assistant Mounts | `git status` sauber, HA liest `configuration.yaml` und `packages/` aus dem Clone |
@@ -104,6 +105,7 @@ Aktuell relevante Dump-Artefakte unter `/mnt/user/backups/borg/dumps/latest`:
- `filebrowser.bolt.dump` - `filebrowser.bolt.dump`
- `borg-ui.sqlite` - `borg-ui.sqlite`
- `grafana.sqlite` - `grafana.sqlite`
- `n8n.sqlite.dump`
- `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256` und Manifest - `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256` und Manifest
- Monitoring-Stack: keine verpflichtenden Dump-Artefakte; Prometheus/Loki/Grafana named volumes sind Diagnose-/Dashboard-Zustand, keine primaere Restore-Quelle. - Monitoring-Stack: keine verpflichtenden Dump-Artefakte; Prometheus/Loki/Grafana named volumes sind Diagnose-/Dashboard-Zustand, keine primaere Restore-Quelle.
- `komodo-mongo.archive.gz` (noch gesondert verifizieren) - `komodo-mongo.archive.gz` (noch gesondert verifizieren)
+9 -6
View File
@@ -4,13 +4,16 @@ services:
container_name: monitoring-prometheus container_name: monitoring-prometheus
restart: unless-stopped restart: unless-stopped
command: command:
- --config.file=/etc/prometheus/prometheus.yml - --config.file=/etc/prometheus/config/prometheus.yml
- --storage.tsdb.path=/prometheus - --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=30d - --storage.tsdb.retention.time=30d
- --web.enable-lifecycle - --web.enable-lifecycle
volumes: volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro # Verzeichnis-Mount statt Einzeldatei: auf dem Unraid-FUSE-Share (/mnt/user)
- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro # bricht ein Einzeldatei-Bind-Mount bei git/Komodo-Updates zu
# "Stale NFS file handle" (Inode-Wechsel) -> Reload laedt 0 Regeln, nur
# --force-recreate heilt. Directory-Inode ist stabil, Reload reicht wieder.
- ./prometheus:/etc/prometheus/config:ro
- prometheus_data:/prometheus - prometheus_data:/prometheus
networks: networks:
- monitoring_net - monitoring_net
@@ -25,7 +28,7 @@ services:
- cadvisor - cadvisor
alertmanager: alertmanager:
image: prom/alertmanager:v0.32.2@sha256:b85533a2eb45865835315810315f6951331b2dbc8c93a6cf9a51e156a006a706 image: prom/alertmanager:v0.33.0@sha256:af26fbe4dd1886ac0efd7bd55cd9027da262e105b137a376522b7c14c3626e4a
container_name: monitoring-alertmanager container_name: monitoring-alertmanager
restart: unless-stopped restart: unless-stopped
command: command:
@@ -42,7 +45,7 @@ services:
- no-new-privileges:true - no-new-privileges:true
alertmanager-ntfy-bridge: alertmanager-ntfy-bridge:
image: python:3.14-alpine@sha256:5a824eb82cc75361f98611f3cfc5091ea33f10a6ccea4d4ebdabbc523b9a1614 image: python:3.14-alpine@sha256:26730869004e2b9c4b9ad09cab8625e81d256d1ce97e72df5520e806b1709f92
container_name: monitoring-alertmanager-ntfy-bridge container_name: monitoring-alertmanager-ntfy-bridge
restart: unless-stopped restart: unless-stopped
dns: dns:
@@ -337,7 +340,7 @@ services:
- no-new-privileges:true - no-new-privileges:true
influxdb3-core: influxdb3-core:
image: influxdb:3.9.3-core@sha256:c27c9b2ca2625b5b6966f0b09baa448102310e63a471fd60dff22646a2522e29 image: influxdb:3.10.0-core@sha256:b3e577f38c19963597170d8850a3a7f77af8f0cfa866c64cd13e5de0f238e114
container_name: monitoring-influxdb3-core container_name: monitoring-influxdb3-core
user: "0" user: "0"
restart: unless-stopped restart: unless-stopped
+72
View File
@@ -131,6 +131,78 @@ groups:
summary: "Latest Borg backup completed with warnings" summary: "Latest Borg backup completed with warnings"
description: "The latest Borg UI job completed with warnings for archive {{ $labels.archive }}." description: "The latest Borg UI job completed with warnings for archive {{ $labels.archive }}."
- alert: HomelabBorgScopeSourceListMissing
expr: homelab_borg_scope_expected_file_present != 1
for: 15m
labels:
severity: critical
annotations:
summary: "Borg expected source list is not visible"
description: "Borg UI cannot see the repo source list used for drift checks."
- alert: HomelabBorgScopeMissingSources
expr: homelab_borg_scope_missing_sources_total > 0
for: 15m
labels:
severity: critical
annotations:
summary: "Borg UI is missing expected backup sources"
description: "Borg UI is missing {{ $value }} source path(s) from ops/borg-ui/all-important-sources.txt."
- alert: HomelabBorgScopeExtraSources
expr: homelab_borg_scope_extra_sources_total > 0
for: 30m
labels:
severity: warning
annotations:
summary: "Borg UI has sources not tracked in the repo"
description: "Borg UI has {{ $value }} source path(s) that are not listed in ops/borg-ui/all-important-sources.txt."
- alert: HomelabBorgDumpMissing
expr: homelab_borg_dump_present == 0
for: 15m
labels:
severity: critical
annotations:
summary: "Borg pre-backup dump is missing: {{ $labels.dump }}"
description: "Expected dump artifact {{ $labels.dump }} is not present in the latest dump set. The pre-backup dump job may have failed or stopped."
- alert: HomelabBorgDumpStale
expr: homelab_borg_dump_age_seconds > 30 * 60 * 60
for: 15m
labels:
severity: critical
annotations:
summary: "Borg pre-backup dump is stale: {{ $labels.dump }}"
description: "Dump artifact {{ $labels.dump }} is older than 30 hours. pre-backup-dumps.sh may have stopped; Borg would keep archiving stale database content without a job failure."
- alert: HomelabBorgRepositoryCheckStale
expr: time() - homelab_borg_repository_last_check_timestamp_seconds > 14 * 24 * 60 * 60
for: 30m
labels:
severity: warning
annotations:
summary: "Borg repository check is stale"
description: "Borg repository {{ $labels.repository }} has not had a recorded check for more than 14 days."
- alert: HomelabBorgRetentionDisabled
expr: homelab_borg_schedule_prune_after_enabled != 1
for: 30m
labels:
severity: warning
annotations:
summary: "Borg retention pruning is disabled"
description: "Scheduled Borg job {{ $labels.schedule }} does not run prune after backup."
- alert: HomelabBorgCompactDisabled
expr: homelab_borg_schedule_compact_after_enabled != 1
for: 30m
labels:
severity: warning
annotations:
summary: "Borg compaction is disabled"
description: "Scheduled Borg job {{ $labels.schedule }} does not run compact after backup."
- alert: HomelabCriticalContainerDown - alert: HomelabCriticalContainerDown
expr: homelab_critical_container_running == 0 expr: homelab_critical_container_running == 0
for: 5m for: 5m
+1 -1
View File
@@ -5,7 +5,7 @@ global:
site: kallilabcore site: kallilabcore
rule_files: rule_files:
- /etc/prometheus/alerts.yml - /etc/prometheus/config/alerts.yml
alerting: alerting:
alertmanagers: alertmanagers:
+14 -1
View File
@@ -48,11 +48,12 @@ The Unraid flash configuration archive is intentional as well and must be treate
| Grafana | SQLite dump from `monitoring_grafana_data` + provisioned config in Git | `/local/borg-dumps`, `monitoring/grafana/provisioning`, `monitoring/grafana/dashboards` | | Grafana | SQLite dump from `monitoring_grafana_data` + provisioned config in Git | `/local/borg-dumps`, `monitoring/grafana/provisioning`, `monitoring/grafana/dashboards` |
| Filebrowser | file-backed state dump + file data | `/local/borg-dumps`, `/local/appdata/filebrowser` | | Filebrowser | file-backed state dump + file data | `/local/borg-dumps`, `/local/appdata/filebrowser` |
| InfluxDB 3 Core | file data | `/local/appdata/influxdb3/data`, `/local/appdata/influxdb3/plugins` | | InfluxDB 3 Core | file data | `/local/appdata/influxdb3/data`, `/local/appdata/influxdb3/plugins` |
| n8n | SQLite dump + encrypted workflow/credential state | `/local/borg-dumps`, `/local/appdata/n8n/data` |
| Home Assistant | HA-native backup + file state | `/local/appdata/homeassistant`, `/local/services/smart-home-kalli` | | Home Assistant | HA-native backup + file state | `/local/appdata/homeassistant`, `/local/services/smart-home-kalli` |
| Smart-Home MQTT / Mosquitto | file data | `/local/appdata/mosquitto/config`, `/local/appdata/mosquitto/data` | | Smart-Home MQTT / Mosquitto | file data | `/local/appdata/mosquitto/config`, `/local/appdata/mosquitto/data` |
| Zigbee2MQTT (planned) | file data + coordinator state | `/local/appdata/zigbee2mqtt`, `/local/services/smart-home-kalli` | | Zigbee2MQTT (planned) | file data + coordinator state | `/local/appdata/zigbee2mqtt`, `/local/services/smart-home-kalli` |
| ESPHome (planned) | Fachrepo + optional build/runtime cache | `/local/services/smart-home-kalli/esphome`, optional `/local/appdata/esphome` | | ESPHome (planned) | Fachrepo + optional build/runtime cache | `/local/services/smart-home-kalli/esphome`, optional `/local/appdata/esphome` |
| Hermes Agent | file data + SSH key | `/local/appdata/hermes-agent/data`, `/local/secrets/hermes_runner_id_ed25519` | | Hermes Agent | file data + SSH key | SSH-Key via `/local/secrets`; `/local/appdata/hermes-agent/data` ist bewusst NICHT in `all-important-sources.txt`, weil der Stack geparkt ist (Review 2026-07-25). Beim Aktivieren des Stacks in die Quellliste aufnehmen. |
| BentoPDF | rebuildable | no critical persistence in compose | | BentoPDF | rebuildable | no critical persistence in compose |
## Open Decisions and Coverage Gaps ## Open Decisions and Coverage Gaps
@@ -71,6 +72,17 @@ Option A umgesetzt: `pre-backup-dumps.sh` writes `nextcloud.dump` from `nextclou
The live Unraid User Scripts execute repo scripts from `/mnt/user/services/homelab-infra`, while Komodo keeps stack workspaces below `/mnt/user/services/stacks`. These paths are now mounted into Borg UI as `/local/services/...` and included explicitly so host-side script hotfixes, stack workspace state, and posture-check state are recoverable. The live Unraid User Scripts execute repo scripts from `/mnt/user/services/homelab-infra`, while Komodo keeps stack workspaces below `/mnt/user/services/stacks`. These paths are now mounted into Borg UI as `/local/services/...` and included explicitly so host-side script hotfixes, stack workspace state, and posture-check state are recoverable.
### User-Daten-Shares ausserhalb des App-Scope
Filebrowser serviert `/mnt/user/projekte`, `/mnt/user/documents` und `/mnt/user/photos` komplett (`ops/filebrowser/docker-compose.yml`). Der Borg-Scope deckt aber bewusst nur die App-Unterordner ab (`documents/paperless*`, `documents/nextcloud-data`, `documents/scans_inbox`, `photos/immich`, `photos/family_archive`).
- **`/mnt/user/projekte`** ist aktuell in **keinem** Borg-Scope. Ad-hoc-Dateien, die direkt unter `documents/` oder `photos/` (ausserhalb der genannten App-Ordner) abgelegt werden, ebenfalls nicht.
- Entscheidung Operator offen (Eintrag in `docs/MASTER_TODO.md`): Entweder `projekte` als eigenen read-only Borg-UI-Mount + Quelllisten-Eintrag aufnehmen, oder bewusst als "nur lokal, nicht DR-relevant" bestaetigen. Bis zur Entscheidung gilt: dort liegende Originaldaten sind **nicht** wiederherstellbar.
### Komodo keys
Production still stores Komodo Core/Periphery keys in the Docker named volume `komodo_komodo_keys`. This is a known open migration item and is not fixed by the Borg source list alone. Target state: move the keys to a host path such as `/mnt/user/appdata/komodo/keys` and mount that path into both Komodo containers, then include it in Borg. Do not treat this as solved until the live Compose stack has been migrated and Periphery reconnect has been verified.
## Database Dumps Required ## Database Dumps Required
### Shared PostgreSQL (`postgresql17`, runtime PostgreSQL 18) ### Shared PostgreSQL (`postgresql17`, runtime PostgreSQL 18)
@@ -89,6 +101,7 @@ The live Unraid User Scripts execute repo scripts from `/mnt/user/services/homel
- Komodo MongoDB - Komodo MongoDB
- SQLite: `gitea`, `vaultwarden`, `speedtest-tracker`, `borg-ui`, `grafana` - SQLite: `gitea`, `vaultwarden`, `speedtest-tracker`, `borg-ui`, `grafana`
- SQLite: `n8n` (`n8n.sqlite.dump`, credentials require the matching `N8N_ENCRYPTION_KEY`)
- File-backed state: `filebrowser.bolt.dump` - File-backed state: `filebrowser.bolt.dump`
- Unraid flash config: `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256` - Unraid flash config: `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256`
- Home Assistant native backups: created by HA under `/mnt/user/appdata/homeassistant/backups` and captured as file state - Home Assistant native backups: created by HA under `/mnt/user/appdata/homeassistant/backups` and captured as file state
+6
View File
@@ -18,6 +18,12 @@
/local/appdata/borg-ui/data /local/appdata/borg-ui/data
/local/appdata/komodo/periphery /local/appdata/komodo/periphery
/local/appdata/komodo/core /local/appdata/komodo/core
/local/appdata/nextcloud/html
/local/nextcloud/data
/local/appdata/n8n/data
/local/appdata/filebrowser
/local/appdata/influxdb3/data
/local/appdata/influxdb3/plugins
/local/services/homelab-infra /local/services/homelab-infra
/local/services/smart-home-kalli /local/services/smart-home-kalli
/local/services/stacks /local/services/stacks
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
borg-ui: borg-ui:
image: ainullcode/borg-ui@sha256:0922157e8f77a1b2bd23cd09366a458ea6de07fd9306aa1485f9cfe623eca17f image: ainullcode/borg-ui@sha256:e51b3d2e6cb38d1ba127ef60ba442c1e157965327196e6f7afb69f30c0ba99d1
container_name: borg-ui container_name: borg-ui
restart: unless-stopped restart: unless-stopped
security_opt: security_opt:
+1
View File
@@ -325,6 +325,7 @@ main() {
# Additional host-side SQLite dumps for admin tooling with appdata files. # Additional host-side SQLite dumps for admin tooling with appdata files.
dump_sqlite_file "/mnt/user/appdata/borg-ui/data/borg.db" "$LATEST_DIR/borg-ui.sqlite" "borg-ui" dump_sqlite_file "/mnt/user/appdata/borg-ui/data/borg.db" "$LATEST_DIR/borg-ui.sqlite" "borg-ui"
dump_sqlite_file "/var/lib/docker/volumes/monitoring_grafana_data/_data/grafana.db" "$LATEST_DIR/grafana.sqlite" "grafana" dump_sqlite_file "/var/lib/docker/volumes/monitoring_grafana_data/_data/grafana.db" "$LATEST_DIR/grafana.sqlite" "grafana"
dump_sqlite_file "/mnt/user/appdata/n8n/data/database.sqlite" "$LATEST_DIR/n8n.sqlite.dump" "n8n"
# MongoDB # MongoDB
dump_mongo_container "komodo-mongo" "$LATEST_DIR/komodo-mongo.archive.gz" dump_mongo_container "komodo-mongo" "$LATEST_DIR/komodo-mongo.archive.gz"
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
code-server: code-server:
image: lscr.io/linuxserver/code-server:4.123.0@sha256:cb261a7f87674b445e0fd66d87d55900c1b823d276c727ab0d168a75e69e9992 image: lscr.io/linuxserver/code-server:4.125.0@sha256:7e9523734c003b6336781942df7b48aa6936a9df6931c12a19a1f7ad7858eeba
container_name: code-server container_name: code-server
restart: unless-stopped restart: unless-stopped
security_opt: security_opt:
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
filebrowser: filebrowser:
image: filebrowser/filebrowser:v2.63.14@sha256:1ec9b0c68297550c92f4a93feed432850c2993b261706cc3cc2e808f94a95e76 image: filebrowser/filebrowser:v2.63.15@sha256:9805b21cf910f3ef6f4a1c8f441f1dd6cc4197136f9541fe2a1ab6d050706e4b
container_name: filebrowser container_name: filebrowser
restart: unless-stopped restart: unless-stopped
security_opt: security_opt:
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
glances: glances:
image: nicolargo/glances:latest-full@sha256:60872a1af0e40a3150975617c7e811ad7ad48f95bc45d033fb0c1737a037e4d2 image: nicolargo/glances:latest-full@sha256:58651aabedf62db8bfc1d252f8d3889675dfcdb5d0ad1c177ae5879c21626f3a
container_name: glances container_name: glances
restart: unless-stopped restart: unless-stopped
pid: host pid: host
@@ -25,6 +25,7 @@ $Jobs = @(
"immich.dump", "immich.dump",
"komodo-mongo.archive.gz", "komodo-mongo.archive.gz",
"mealie.dump", "mealie.dump",
"n8n.sqlite.dump",
"nextcloud.dump", "nextcloud.dump",
"postgresql17-authelia.dump", "postgresql17-authelia.dump",
"postgresql17-globals.sql", "postgresql17-globals.sql",
@@ -6,6 +6,7 @@ param(
) )
$checks = @( $checks = @(
@{ Name = "postgresql17-globals.sql"; Path = Join-Path $DumpRoot "postgresql17-globals.sql" },
@{ Name = "postgresql17-paperless.dump"; Path = Join-Path $DumpRoot "postgresql17-paperless.dump" }, @{ Name = "postgresql17-paperless.dump"; Path = Join-Path $DumpRoot "postgresql17-paperless.dump" },
@{ Name = "postgresql17-mailarchiver.dump"; Path = Join-Path $DumpRoot "postgresql17-mailarchiver.dump" }, @{ Name = "postgresql17-mailarchiver.dump"; Path = Join-Path $DumpRoot "postgresql17-mailarchiver.dump" },
@{ Name = "mealie.dump"; Path = Join-Path $DumpRoot "mealie.dump" }, @{ Name = "mealie.dump"; Path = Join-Path $DumpRoot "mealie.dump" },
@@ -13,6 +14,7 @@ $checks = @(
@{ Name = "nextcloud.dump"; Path = Join-Path $DumpRoot "nextcloud.dump" }, @{ Name = "nextcloud.dump"; Path = Join-Path $DumpRoot "nextcloud.dump" },
@{ Name = "gitea.sqlite.dump"; Path = Join-Path $DumpRoot "gitea.sqlite.dump" }, @{ Name = "gitea.sqlite.dump"; Path = Join-Path $DumpRoot "gitea.sqlite.dump" },
@{ Name = "vaultwarden.sqlite.dump"; Path = Join-Path $DumpRoot "vaultwarden.sqlite.dump" }, @{ Name = "vaultwarden.sqlite.dump"; Path = Join-Path $DumpRoot "vaultwarden.sqlite.dump" },
@{ Name = "n8n.sqlite.dump"; Path = Join-Path $DumpRoot "n8n.sqlite.dump" },
@{ Name = "speedtest-tracker.sqlite.dump"; Path = Join-Path $DumpRoot "speedtest-tracker.sqlite.dump" }, @{ Name = "speedtest-tracker.sqlite.dump"; Path = Join-Path $DumpRoot "speedtest-tracker.sqlite.dump" },
@{ Name = "filebrowser.bolt.dump"; Path = Join-Path $DumpRoot "filebrowser.bolt.dump" }, @{ Name = "filebrowser.bolt.dump"; Path = Join-Path $DumpRoot "filebrowser.bolt.dump" },
@{ Name = "unraid-flash-config.tar.gz"; Path = Join-Path $DumpRoot "unraid-flash-config.tar.gz" } @{ Name = "unraid-flash-config.tar.gz"; Path = Join-Path $DumpRoot "unraid-flash-config.tar.gz" }
@@ -89,6 +89,7 @@ check_pg_header() {
} }
for dump in \ for dump in \
postgresql17-globals.sql \
postgresql17-paperless.dump \ postgresql17-paperless.dump \
postgresql17-mailarchiver.dump \ postgresql17-mailarchiver.dump \
mealie.dump \ mealie.dump \
@@ -96,6 +97,7 @@ for dump in \
nextcloud.dump \ nextcloud.dump \
gitea.sqlite.dump \ gitea.sqlite.dump \
vaultwarden.sqlite.dump \ vaultwarden.sqlite.dump \
n8n.sqlite.dump \
speedtest-tracker.sqlite.dump \ speedtest-tracker.sqlite.dump \
filebrowser.bolt.dump \ filebrowser.bolt.dump \
unraid-flash-config.tar.gz; do unraid-flash-config.tar.gz; do
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
scrutiny: scrutiny:
image: ghcr.io/starosdev/scrutiny:latest-omnibus@sha256:228483f16a6236d2fa9b2fbfca2e76dc861e648fbc6ae6e680d23e5d00211a5d image: ghcr.io/starosdev/scrutiny:latest-omnibus@sha256:d79e6f1bc299ab28fbd95c9e05fa5a8c565332d2cb9091a91e42d84d4d939989
container_name: scrutiny container_name: scrutiny
restart: unless-stopped restart: unless-stopped
privileged: true privileged: true
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
speedtest-tracker: speedtest-tracker:
image: lscr.io/linuxserver/speedtest-tracker:1.14.3@sha256:c3750c40948a9360000ce62d694da92e85584b4ab6d3d9a9d1432d76fa5e0726 image: lscr.io/linuxserver/speedtest-tracker:1.14.4@sha256:f99dfd097709016dfb4387d65bfdc0419bde99cf1dce7e26e70ca616c86f1281
container_name: speedtest-tracker container_name: speedtest-tracker
restart: unless-stopped restart: unless-stopped
security_opt: security_opt:
@@ -4,7 +4,11 @@ set -euo pipefail
TEXTFILE_DIR="${TEXTFILE_DIR:-/mnt/user/services/posture-check/textfile}" TEXTFILE_DIR="${TEXTFILE_DIR:-/mnt/user/services/posture-check/textfile}"
OUTPUT_FILE="${OUTPUT_FILE:-$TEXTFILE_DIR/homelab.prom}" OUTPUT_FILE="${OUTPUT_FILE:-$TEXTFILE_DIR/homelab.prom}"
BORG_CONTAINER="${BORG_CONTAINER:-borg-ui}" BORG_CONTAINER="${BORG_CONTAINER:-borg-ui}"
CRITICAL_CONTAINERS="${CRITICAL_CONTAINERS:-traefik authelia postgresql17 gitea komodo-core komodo-mongo komodo-periphery vaultwarden borg-ui ntfy adguard unbound monitoring-alertmanager monitoring-alertmanager-ntfy-bridge monitoring-blackbox-exporter monitoring-cadvisor monitoring-grafana monitoring-loki monitoring-node-exporter monitoring-promtail immich_server immich_postgres immich_redis paperless-ngx nextcloud nextcloud-postgres nextcloud-redis mealie mealie-postgres}" BORG_EXPECTED_SOURCES_FILE="${BORG_EXPECTED_SOURCES_FILE:-/local/services/homelab-infra/ops/borg-ui/all-important-sources.txt}"
# Host-Pfad der aktuellen Dump-Artefakte (pre-backup-dumps.sh schreibt hierhin).
# Wird host-seitig gestattet; der Exporter laeuft als Unraid User Script.
BORG_DUMP_DIR="${BORG_DUMP_DIR:-/mnt/user/backups/borg/dumps/latest}"
CRITICAL_CONTAINERS="${CRITICAL_CONTAINERS:-traefik authelia postgresql17 gitea komodo-core komodo-mongo komodo-periphery vaultwarden borg-ui ntfy adguard unbound monitoring-alertmanager monitoring-alertmanager-ntfy-bridge monitoring-blackbox-exporter monitoring-cadvisor monitoring-grafana monitoring-loki monitoring-node-exporter monitoring-promtail immich_server immich_postgres immich_redis paperless-ngx nextcloud nextcloud-postgres nextcloud-redis mealie mealie-postgres mail-archiver n8n homeassistant smarthome-mosquitto}"
# Hinweis: Tailscale laeuft als natives Unraid-Plugin (kein Docker-Container) und # Hinweis: Tailscale laeuft als natives Unraid-Plugin (kein Docker-Container) und
# wird daher hier bewusst NICHT als kritischer Container gefuehrt (Stand 2026-06-06). # wird daher hier bewusst NICHT als kritischer Container gefuehrt (Stand 2026-06-06).
@@ -90,11 +94,32 @@ EOF
# TYPE homelab_borg_last_success gauge # TYPE homelab_borg_last_success gauge
# HELP homelab_borg_last_job_warning Whether the most recent Borg backup job completed with warnings. # HELP homelab_borg_last_job_warning Whether the most recent Borg backup job completed with warnings.
# TYPE homelab_borg_last_job_warning gauge # TYPE homelab_borg_last_job_warning gauge
# HELP homelab_borg_repository_last_check_timestamp_seconds Unix timestamp of the latest Borg repository check known to Borg UI.
# TYPE homelab_borg_repository_last_check_timestamp_seconds gauge
# HELP homelab_borg_scope_expected_file_present Whether the expected Borg source list file is visible inside Borg UI.
# TYPE homelab_borg_scope_expected_file_present gauge
# HELP homelab_borg_scope_expected_sources_total Number of expected Borg source paths from the repo source list.
# TYPE homelab_borg_scope_expected_sources_total gauge
# HELP homelab_borg_scope_configured_sources_total Number of Borg source paths configured in Borg UI.
# TYPE homelab_borg_scope_configured_sources_total gauge
# HELP homelab_borg_scope_missing_sources_total Number of expected Borg source paths missing from Borg UI.
# TYPE homelab_borg_scope_missing_sources_total gauge
# HELP homelab_borg_scope_extra_sources_total Number of Borg UI source paths not present in the repo source list.
# TYPE homelab_borg_scope_extra_sources_total gauge
# HELP homelab_borg_scope_source_configured Whether an expected Borg source path is configured in Borg UI.
# TYPE homelab_borg_scope_source_configured gauge
# HELP homelab_borg_schedule_prune_after_enabled Whether a Borg scheduled job runs prune after backup.
# TYPE homelab_borg_schedule_prune_after_enabled gauge
# HELP homelab_borg_schedule_compact_after_enabled Whether a Borg scheduled job runs compact after backup.
# TYPE homelab_borg_schedule_compact_after_enabled gauge
EOF EOF
if docker inspect "$BORG_CONTAINER" >/dev/null 2>&1; then if docker inspect "$BORG_CONTAINER" >/dev/null 2>&1; then
docker exec -i "$BORG_CONTAINER" python3 - <<'PY' docker exec -i -e BORG_EXPECTED_SOURCES_FILE="$BORG_EXPECTED_SOURCES_FILE" "$BORG_CONTAINER" python3 - <<'PY'
import datetime as dt import datetime as dt
import json
import os
from pathlib import Path
import sqlite3 import sqlite3
conn = sqlite3.connect("/data/borg.db") conn = sqlite3.connect("/data/borg.db")
@@ -135,6 +160,9 @@ def parse_ts(value):
def escape_label(value): def escape_label(value):
return (value or "").replace("\\", "\\\\").replace('"', '\\"') return (value or "").replace("\\", "\\\\").replace('"', '\\"')
def bool_metric(value):
return 1 if value else 0
latest_status = latest["status"] if latest else "missing" latest_status = latest["status"] if latest else "missing"
latest_success = 1 if latest_status in ("completed", "completed_with_warnings") else 0 latest_success = 1 if latest_status in ("completed", "completed_with_warnings") else 0
latest_warning = 1 if latest_status == "completed_with_warnings" else 0 latest_warning = 1 if latest_status == "completed_with_warnings" else 0
@@ -145,12 +173,107 @@ completed_archive = escape_label(completed["archive_name"] if completed else "")
print(f'homelab_borg_last_success{{status="{latest_status}",archive="{latest_archive}"}} {latest_success}') print(f'homelab_borg_last_success{{status="{latest_status}",archive="{latest_archive}"}} {latest_success}')
print(f'homelab_borg_last_job_warning{{status="{latest_status}",archive="{latest_archive}"}} {latest_warning}') print(f'homelab_borg_last_job_warning{{status="{latest_status}",archive="{latest_archive}"}} {latest_warning}')
print(f'homelab_borg_last_completed_timestamp_seconds{{archive="{completed_archive}"}} {completed_ts}') print(f'homelab_borg_last_completed_timestamp_seconds{{archive="{completed_archive}"}} {completed_ts}')
repo = cur.execute("""
select id, name, source_directories, last_check
from repositories
order by id
limit 1
""").fetchone()
if repo:
repo_name = escape_label(repo["name"] or str(repo["id"]))
print(f'homelab_borg_repository_last_check_timestamp_seconds{{repository="{repo_name}"}} {parse_ts(repo["last_check"])}')
try:
configured_sources = json.loads(repo["source_directories"] or "[]")
except json.JSONDecodeError:
configured_sources = []
else:
configured_sources = []
expected_path = Path(os.environ.get("BORG_EXPECTED_SOURCES_FILE", ""))
expected_file_present = expected_path.is_file()
if expected_file_present:
expected_sources = [
line.strip()
for line in expected_path.read_text(encoding="utf-8").splitlines()
if line.strip() and not line.lstrip().startswith("#")
]
else:
expected_sources = []
configured_set = set(configured_sources)
expected_set = set(expected_sources)
missing_sources = [source for source in expected_sources if source not in configured_set]
extra_sources = [source for source in configured_sources if source not in expected_set]
print(f"homelab_borg_scope_expected_file_present {bool_metric(expected_file_present)}")
print(f"homelab_borg_scope_expected_sources_total {len(expected_sources)}")
print(f"homelab_borg_scope_configured_sources_total {len(configured_sources)}")
print(f"homelab_borg_scope_missing_sources_total {len(missing_sources)}")
print(f"homelab_borg_scope_extra_sources_total {len(extra_sources)}")
for source in expected_sources:
value = 1 if source in configured_set else 0
print(f'homelab_borg_scope_source_configured{{source="{escape_label(source)}"}} {value}')
for source in extra_sources:
print(f'homelab_borg_scope_source_configured{{source="{escape_label(source)}",state="extra"}} 0')
for schedule in cur.execute("""
select id, name, run_prune_after, run_compact_after
from scheduled_jobs
where enabled = 1
order by id
"""):
schedule_name = escape_label(schedule["name"] or str(schedule["id"]))
print(f'homelab_borg_schedule_prune_after_enabled{{schedule="{schedule_name}"}} {bool_metric(schedule["run_prune_after"])}')
print(f'homelab_borg_schedule_compact_after_enabled{{schedule="{schedule_name}"}} {bool_metric(schedule["run_compact_after"])}')
PY PY
else else
printf 'homelab_borg_last_success{status="container_missing",archive=""} 0\n' printf 'homelab_borg_last_success{status="container_missing",archive=""} 0\n'
printf 'homelab_borg_last_job_warning{status="container_missing",archive=""} 0\n' printf 'homelab_borg_last_job_warning{status="container_missing",archive=""} 0\n'
printf 'homelab_borg_last_completed_timestamp_seconds{archive=""} 0\n' printf 'homelab_borg_last_completed_timestamp_seconds{archive=""} 0\n'
printf 'homelab_borg_repository_last_check_timestamp_seconds{repository=""} 0\n'
printf 'homelab_borg_scope_expected_file_present 0\n'
printf 'homelab_borg_scope_expected_sources_total 0\n'
printf 'homelab_borg_scope_configured_sources_total 0\n'
printf 'homelab_borg_scope_missing_sources_total 0\n'
printf 'homelab_borg_scope_extra_sources_total 0\n'
fi fi
# Dump-Frische host-seitig messen. Schliesst den Blindfleck, dass Borg
# weiterlaeuft und stale Dumps archiviert, ohne dass ein Job-Fehler entsteht
# (pre-backup-dumps.sh gestoppt). Laeuft ausserhalb des borg-ui-Containers,
# weil die Dumps host-seitig unter $BORG_DUMP_DIR liegen.
cat <<'EOF'
# HELP homelab_borg_dump_present Whether an expected Borg pre-backup dump artifact exists in the latest dump set.
# TYPE homelab_borg_dump_present gauge
# HELP homelab_borg_dump_age_seconds Age in seconds of an expected Borg pre-backup dump artifact.
# TYPE homelab_borg_dump_age_seconds gauge
EOF
for dump in \
postgresql17-globals.sql \
postgresql17-mailarchiver.dump \
postgresql17-paperless.dump \
mealie.dump \
immich.dump \
nextcloud.dump \
gitea.sqlite.dump \
vaultwarden.sqlite.dump \
n8n.sqlite.dump \
unraid-flash-config.tar.gz \
komodo-mongo.archive.gz; do
dump_path="$BORG_DUMP_DIR/$dump"
if [ -f "$dump_path" ]; then
dump_mtime="$(stat -c %Y "$dump_path" 2>/dev/null || echo 0)"
printf 'homelab_borg_dump_present{dump="%s"} 1\n' "$dump"
printf 'homelab_borg_dump_age_seconds{dump="%s"} %s\n' "$dump" "$(( now - dump_mtime ))"
else
printf 'homelab_borg_dump_present{dump="%s"} 0\n' "$dump"
fi
done
} > "$tmp" } > "$tmp"
# 0644 statt mktemp-default 0600, damit der node-exporter-Textfile-Collector # 0644 statt mktemp-default 0600, damit der node-exporter-Textfile-Collector
+1 -1
View File
@@ -1,6 +1,6 @@
services: services:
homeassistant: homeassistant:
image: ghcr.io/home-assistant/home-assistant:2026.6.1@sha256:59aa8824955c9db491b75d2eebe42bd68494f80c2ec69ec0d66d9dae37d37514 image: ghcr.io/home-assistant/home-assistant:2026.6.3@sha256:aed891b8f801072302815b4b0fab5adb714182967e9d2e2d4a2be558241c73ad
container_name: homeassistant container_name: homeassistant
restart: unless-stopped restart: unless-stopped
environment: environment: