1 Commits

Author SHA1 Message Date
renovate e44ce38cf2 chore(deps): update docker.n8n.io/n8nio/n8n docker tag to v2.27.2 2026-06-18 10:20:57 +00:00
14 changed files with 13 additions and 252 deletions
+1 -1
View File
@@ -1,6 +1,6 @@
services:
n8n:
image: docker.n8n.io/n8nio/n8n:2.26.2@sha256:61ba01bc5e39304bbc928c9dbecd938c3a5cc1331b68affba6a34d0f654c43d9
image: docker.n8n.io/n8nio/n8n:2.27.2@sha256:77a649a752e07c417c3dad8ac094611f0e4aa79674964ad338ef702752a5988c
container_name: n8n
restart: unless-stopped
+1 -9
View File
@@ -1,6 +1,6 @@
# Alert Rules
Stand: 2026-06-18
Stand: 2026-06-05
Diese Datei beschreibt die produktiven Alarmwege und wichtigsten Regeln. Die
Konfiguration selbst liegt in `monitoring/prometheus/alerts.yml` und in den
@@ -36,14 +36,6 @@ Skripten unter `services/posture-check/`.
| `HomelabBorgBackupStale` | letztes Borg-Backup >30h | warning | Backup-Lauf nachholen/pruefen |
| `HomelabBorgLastJobFailed` | letzter Borg-Job fehlgeschlagen | critical | Borg-UI-Job-Log pruefen |
| `HomelabBorgLastJobCompletedWithWarnings` | letzter Borg-Job mit Warnungen | warning | Warnung im Borg-UI-Job lesen |
| `HomelabBorgDumpMissing` | erwartetes Dump-Artefakt fehlt im aktuellen Dump-Set | critical | `pre-backup-dumps.sh`/User-Script pruefen |
| `HomelabBorgDumpStale` | Dump-Artefakt >30h alt (Borg laeuft, Dumps eingefroren) | critical | `pre-backup-dumps.sh`/User-Script pruefen, nicht nur den Borg-Job |
| `HomelabBorgScopeSourceListMissing` | Repo-Quellliste fuer Borg-Drift-Check fehlt | critical | Borg-UI-Mount `/local/services/homelab-infra` und Repo-Pfad pruefen |
| `HomelabBorgScopeMissingSources` | Borg UI enthaelt nicht alle Pfade aus `ops/borg-ui/all-important-sources.txt` | critical | Live-Borg-Scope an Repo-Quelle angleichen |
| `HomelabBorgScopeExtraSources` | Borg UI enthaelt Pfade ausserhalb der Repo-Quellliste | warning | Doku oder Live-Scope bereinigen |
| `HomelabBorgRepositoryCheckStale` | letzter Borg-Check >14 Tage alt | warning | Borg-Repository-Check ausfuehren oder Scheduler pruefen |
| `HomelabBorgRetentionDisabled` | Scheduled Job fuehrt kein Prune aus | warning | Retention-Einstellung in Borg UI pruefen |
| `HomelabBorgCompactDisabled` | Scheduled Job fuehrt kein Compact aus | warning | Compact-Einstellung in Borg UI pruefen |
| `HomelabCriticalContainerDown` | kritischer Container fehlt | critical | Komodo/Docker-Status pruefen |
| `HomelabPrometheusTargetDown` | Scrape-Ziel down | critical | node-exporter/cadvisor/blackbox/traefik pruefen |
+4 -10
View File
@@ -1,6 +1,6 @@
# Master To-do - KalliLab CORE
Typ: Status/To-do · Stand: 2026-06-18 · Status: aktiv
Typ: Status/To-do · Stand: 2026-06-17 · Status: aktiv
Diese Liste ist die **einzige** Arbeitsliste fuer offene operative Punkte im
Homelab. Detailablaeufe stehen in den verlinkten Runbooks; Entscheidungen mit
@@ -25,19 +25,14 @@ Host-Reports (`/mnt/user/backups/restore-reports/`) und in der Git-Historie.
| Restore-Test Tailscale | Operator | State-Validierung + Reconnect nur auf Wegwerf-Host/VM, danach Geraet in Tailscale-Admin entfernen | `ops/restore-tests/tailscale-runbook.md` |
| Authelia OIDC fuer Apps | Operator/Codex | Live: Grafana + Mealie login-verifiziert; Paperless Secret verdrahtet und Service-Smoke am 2026-06-17 gruen, finaler Browser-Login mit Operator-Account offen. Immich + Nextcloud bewusst geparkt bis Family-Onboarding (siehe `docs/DECISIONS.md` 2026-06-06) | `docs/AUTHELIA_OIDC_PLAN.md` |
| Home Assistant Tibber | Operator/Codex | Tibber per HA-UI-Config-Flow verbinden. Danach Energy-Dashboard um echte Kosten/Preisquelle ergaenzen; SolarEdge-PV, Netz und Speicher sind bereits konfiguriert und validiert | `docs/runbooks/smart-home-bootstrap.md`, `docs/DECISIONS.md` |
| Nearline-Pull Dead-Man's-Switch | Operator | H:-Pull war ~2026-06-04 bis 2026-06-18 still gestoppt (Task fehlte, kein Alarm). Lauf nachgeholt + Scheduled Task `KalliLab H Drive Nearline Pull` neu registriert (State Ready). **Verbleibt:** externer Dead-Man's-Switch (Healthchecks.io-Ping am Ende von `pull-critical-backups.ps1` und `ops/borg-ui/scripts/pre-borg.sh`), da Prometheus auf Unraid den baerchen-Pull nicht sieht | `ops/h-drive-nearline/README.md` |
| Monitoring Single-File-Bind-Mount Hardening | Operator/Claude | Prometheus am 2026-06-19 auf stabilen Directory-Mount (`./prometheus:/etc/prometheus/config:ro`) umgestellt + recreated -> Stale-Handle-Footgun dort beseitigt, Reload reicht wieder. **Verbleibt:** gleiches Einzeldatei-Muster bei alertmanager/blackbox/loki/promtail/grafana-provisioning praeventiv auf Directory-Mount umstellen | `monitoring/docker-compose.yml` |
---
## Operator-Entscheidung
**Stand 2026-06-11: keine offenen Operator-Entscheidungen.**
Getroffene Entscheidungen mit Begruendung und Review-Trigger: `docs/DECISIONS.md`.
| Thema | Entscheidung noetig | Quelle |
|---|---|---|
| `/mnt/user/projekte` Backup-Scope | Filebrowser serviert `projekte` (und ganze `documents`/`photos`), aber nur App-Unterordner sind im Borg-Scope. Entscheiden: `projekte` als read-only Borg-UI-Mount + Quelllisten-Eintrag aufnehmen, oder bewusst als "nur lokal, nicht DR-relevant" bestaetigen | `ops/borg-ui/BACKUP_SCOPE.md` Abschnitt "User-Daten-Shares ausserhalb des App-Scope" |
---
## Geparkt
@@ -57,7 +52,6 @@ Bewusst nicht jetzt - Begruendungen in `docs/DECISIONS.md`, hier nur Thema und T
| Filebrowser-Mount-Scope | naechster Hardening-Sprint | `docs/SERVICE_CATALOG.md` |
| Scrutiny Privileged-Ausnahme | nur mit klarer Begruendung aendern | `docs/SERVICE_CATALOG.md` |
| Immich Redis named volume | passende Wartung am Immich-Stack | `docs/SERVICE_CATALOG.md` |
| Komodo keys named volume | gemeinsames Wartungsfenster mit Operator | Live-Volume `komodo_komodo_keys` nach `/mnt/user/appdata/komodo/keys` migrieren, Compose anpassen, Periphery-Reconnect pruefen, dann in Borg-Scope aufnehmen |
| Storage-Wachstum (zweite NVMe, zweite Array-Disk, ZFS/BTRFS) | Trigger aus Capacity-Doku | `docs/STORAGE_LAYOUT.md`, `docs/CAPACITY_AND_LIFECYCLE.md` |
| Wiederkehrende Restore-Drills | laufend nach Kadenz, inkl. quartalsweisem Frische-Negativtest (`run-restore-checks.sh freshness-negative`) | `docs/RESTORE_MATRIX.md`, `ops/restore-tests/schedule.md` |
| Doku-Quartals-Gaertnern (~15 min) | quartalsweise, erster Lauf mit Q3-Review ab 2026-07-01: Datiertes archivieren, Done-/Review-Logs kuerzen, tote Links pruefen | `docs/REPO_MAP.md` Doku-Regeln |
@@ -77,8 +71,8 @@ Bewusst nicht jetzt - Begruendungen in `docs/DECISIONS.md`, hier nur Thema und T
- **2026-06-17** Offene TODOs gegen Live-Stand abgeglichen: Paperless-OIDC-Secret verdrahtet und Service-Smoke gruen; alter Tailscale-Docker-State nach `_archive/tailscale-removed-2026-06-06/` verschoben; Tailnet-Restpunkt geschlossen.
- **2026-06-17** Repo-Hygiene abgeschlossen: Glance-Widget-Tokens sind in Runtime gesetzt, Audit-PDF liegt extern unter `H:\kallilab-recovery\audits`, Worktree clean.
- **2026-06-17** Komodo/Gitea-Webhooks normalisiert: aktive Komodo-Hooks fuer `Micha/homelab-infra` nutzen Branch-Filter `master`; DB-Backup vor Host-Hotfix erstellt. Workflow-Regel nachgezogen.
- **2026-06-19** Backup-Hardening live verifiziert: Borg-Scope-Drift 0 (alle 33 Quellen konfiguriert), Dumps frisch (11/11 present), neue Dump-Alerts aktiv (25 Regeln, 0 feuern). Prometheus-`alerts.yml`-Stale-Handle (FUSE-Einzeldatei-Mount) per `--force-recreate` behoben und anschliessend dauerhaft auf Directory-Mount umgestellt (recreated, 25 Regeln aktiv).
- **2026-06-18** Backup-Audit-Hardening: Dump-Frische-Metriken + Alerts `HomelabBorgDumpMissing/Stale`, Freshness-Checks + Nearline-Pull um `n8n`/`globals` ergaenzt, 4 Tier-2-Container in Critical-Watch, Scope-Doku fuer `projekte`/Hermes praezisiert. H:-Nearline (still seit 2026-06-04) nachgeholt + Task neu registriert.
- **2026-06-13** Home Assistant MQTT-Integration produktiv verbunden: Config-Entry `smarthome-mosquitto` ist `loaded`, Mosquitto sieht den HA-Client `homeassistant`; `check_config` gruen.
- **2026-06-13** HA Energy Dashboard konfiguriert: Netz, PV und Speicher aus SolarEdge Local gesetzt, `energy/validate` ohne Issues; HA-Backup danach erzeugt.
---
-2
View File
@@ -60,7 +60,6 @@ Sie ist die fachliche Ergaenzung zu `docs/DISASTER_RECOVERY.md`.
| Glance | Git / Borg-Repo | Repo-Konfiguration unter `ops/glance/config/glance.yml`; keine kritische Datenpersistenz | keine | `GLANCE_IMMICH_API_KEY`, `GLANCE_ADGUARD_USERNAME`, `GLANCE_ADGUARD_PASSWORD`, `GLANCE_SPEEDTEST_API_KEY` | Traefik, Authelia, optional interne API-Ziele | Dashboard startet, Widgets laden, Docker-Status laeuft nur ueber `glance-docker-socket-proxy` |
| ntfy | Borg / Share | `/mnt/user/appdata/ntfy` | keine | keine besonderen Secret-Dateien dokumentiert | Traefik | UI und Push-Endpunkt erreichbar |
| Paperless-GPT | Borg / Share | `/mnt/user/appdata/paperless-gpt` | keine eigene DB | `PAPERLESS_API_TOKEN`, `OPENAI_API_KEY` | Traefik, Paperless, OpenAI API | UI startet, Konfiguration vorhanden; LLM-Provider zeigt `openai` / `gpt-5.4-mini` |
| n8n | Borg + Dump | `/mnt/user/appdata/n8n/data` | `n8n.sqlite.dump`; Credentials sind nur mit dem passenden `N8N_ENCRYPTION_KEY` entschluesselbar | `N8N_ENCRYPTION_KEY`, GMX/OpenAI/Gitea-Credentials in n8n | Traefik, GMX IMAP, OpenAI API, Gitea API | UI startet, Owner-Login funktioniert, kritischer Mail->LLM->Gitea-Workflow ist vorhanden und deaktiviert/aktiv wie vor Restore |
| Home Assistant | Borg + HA-native Backups + Fachrepo | `/mnt/user/appdata/homeassistant` inkl. `.storage`, `secrets.yaml`, `trusted_proxies.yaml`, `custom_components` (HACS, `solaredge_modbus_multi`); Fach-YAML aus `/mnt/user/services/smart-home-kalli/home-assistant` | HA-native Backup-Artefakte unter `/mnt/user/appdata/homeassistant/backups`; erstes Artefakt 2026-06-13 erzeugt und tar-lesbar (`backup.json`, `homeassistant.tar.gz`); Backup nach SolarEdge-Integration: `Custom_backup_2026.6.1_2026-06-13_14.59_48645373.tar`; Backup nach Energy-Dashboard-Konfiguration: `Custom_backup_2026.6.1_2026-06-13_15.59_25670583.tar`; keine externe DB in Phase 1 | HA-Secrets in `secrets.yaml`, Integrations-Tokens in `.storage`, MQTT-Credentials, Agent-API-Tokens als Host-Secrets `ha_token_codex`/`ha_token_claude` (nur mit erhaltenem `.storage`-Auth-State nutzbar), spaeter Tibber/InfluxDB-Tokens | Traefik, `frontend_net`, `smarthome_net`, Mosquitto, Fachrepo-Clone, SolarEdge-Wechselrichter `192.168.178.111:1502` | Restore-Test am 2026-06-13 erfolgreich: HA-native Backup + Mosquitto-Appdata + Fachrepo-Clone isoliert gestartet, HA HTTP/API/check_config gruen; produktiv danach HA-MQTT-Config-Entry `smarthome-mosquitto` geladen, SolarEdge Local `solaredge_modbus_multi` loaded mit 68 Entitaeten und Energy Dashboard fuer Netz/PV/Speicher per `energy/validate` ohne Issues; Report `/mnt/user/backups/restore-reports/homeassistant-2026-06-13.md` |
| Smart-Home MQTT / Mosquitto | Borg / Share | `/mnt/user/appdata/mosquitto/config`, `/mnt/user/appdata/mosquitto/data`, `/mnt/user/appdata/mosquitto/log` | Mosquitto persistiert retained messages/subscriptions dateibasiert | `passwordfile`, `aclfile`, spaeter per-Device-User | `smarthome_net`, Home Assistant, spaeter ESPHome/Zigbee2MQTT | Restore-Test am 2026-06-13 erfolgreich: authentifizierter Publish/Subscribe-Smoke mit `homeassistant`-User und retained Topic nach Broker-Restart gruen; produktiv verbindet sich HA als User `homeassistant` |
| Smart-Home Fachrepo | Gitea + Borg-Repo-Clone | `/mnt/user/services/smart-home-kalli` | keine | keine echten Secrets im Repo; `secrets-template/` nur Beispiele | Gitea, Home Assistant Mounts | `git status` sauber, HA liest `configuration.yaml` und `packages/` aus dem Clone |
@@ -105,7 +104,6 @@ Aktuell relevante Dump-Artefakte unter `/mnt/user/backups/borg/dumps/latest`:
- `filebrowser.bolt.dump`
- `borg-ui.sqlite`
- `grafana.sqlite`
- `n8n.sqlite.dump`
- `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256` und Manifest
- Monitoring-Stack: keine verpflichtenden Dump-Artefakte; Prometheus/Loki/Grafana named volumes sind Diagnose-/Dashboard-Zustand, keine primaere Restore-Quelle.
- `komodo-mongo.archive.gz` (noch gesondert verifizieren)
+3 -6
View File
@@ -4,16 +4,13 @@ services:
container_name: monitoring-prometheus
restart: unless-stopped
command:
- --config.file=/etc/prometheus/config/prometheus.yml
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=30d
- --web.enable-lifecycle
volumes:
# Verzeichnis-Mount statt Einzeldatei: auf dem Unraid-FUSE-Share (/mnt/user)
# bricht ein Einzeldatei-Bind-Mount bei git/Komodo-Updates zu
# "Stale NFS file handle" (Inode-Wechsel) -> Reload laedt 0 Regeln, nur
# --force-recreate heilt. Directory-Inode ist stabil, Reload reicht wieder.
- ./prometheus:/etc/prometheus/config:ro
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
- prometheus_data:/prometheus
networks:
- monitoring_net
-72
View File
@@ -131,78 +131,6 @@ groups:
summary: "Latest Borg backup completed with warnings"
description: "The latest Borg UI job completed with warnings for archive {{ $labels.archive }}."
- alert: HomelabBorgScopeSourceListMissing
expr: homelab_borg_scope_expected_file_present != 1
for: 15m
labels:
severity: critical
annotations:
summary: "Borg expected source list is not visible"
description: "Borg UI cannot see the repo source list used for drift checks."
- alert: HomelabBorgScopeMissingSources
expr: homelab_borg_scope_missing_sources_total > 0
for: 15m
labels:
severity: critical
annotations:
summary: "Borg UI is missing expected backup sources"
description: "Borg UI is missing {{ $value }} source path(s) from ops/borg-ui/all-important-sources.txt."
- alert: HomelabBorgScopeExtraSources
expr: homelab_borg_scope_extra_sources_total > 0
for: 30m
labels:
severity: warning
annotations:
summary: "Borg UI has sources not tracked in the repo"
description: "Borg UI has {{ $value }} source path(s) that are not listed in ops/borg-ui/all-important-sources.txt."
- alert: HomelabBorgDumpMissing
expr: homelab_borg_dump_present == 0
for: 15m
labels:
severity: critical
annotations:
summary: "Borg pre-backup dump is missing: {{ $labels.dump }}"
description: "Expected dump artifact {{ $labels.dump }} is not present in the latest dump set. The pre-backup dump job may have failed or stopped."
- alert: HomelabBorgDumpStale
expr: homelab_borg_dump_age_seconds > 30 * 60 * 60
for: 15m
labels:
severity: critical
annotations:
summary: "Borg pre-backup dump is stale: {{ $labels.dump }}"
description: "Dump artifact {{ $labels.dump }} is older than 30 hours. pre-backup-dumps.sh may have stopped; Borg would keep archiving stale database content without a job failure."
- alert: HomelabBorgRepositoryCheckStale
expr: time() - homelab_borg_repository_last_check_timestamp_seconds > 14 * 24 * 60 * 60
for: 30m
labels:
severity: warning
annotations:
summary: "Borg repository check is stale"
description: "Borg repository {{ $labels.repository }} has not had a recorded check for more than 14 days."
- alert: HomelabBorgRetentionDisabled
expr: homelab_borg_schedule_prune_after_enabled != 1
for: 30m
labels:
severity: warning
annotations:
summary: "Borg retention pruning is disabled"
description: "Scheduled Borg job {{ $labels.schedule }} does not run prune after backup."
- alert: HomelabBorgCompactDisabled
expr: homelab_borg_schedule_compact_after_enabled != 1
for: 30m
labels:
severity: warning
annotations:
summary: "Borg compaction is disabled"
description: "Scheduled Borg job {{ $labels.schedule }} does not run compact after backup."
- alert: HomelabCriticalContainerDown
expr: homelab_critical_container_running == 0
for: 5m
+1 -1
View File
@@ -5,7 +5,7 @@ global:
site: kallilabcore
rule_files:
- /etc/prometheus/config/alerts.yml
- /etc/prometheus/alerts.yml
alerting:
alertmanagers:
+1 -14
View File
@@ -48,12 +48,11 @@ The Unraid flash configuration archive is intentional as well and must be treate
| Grafana | SQLite dump from `monitoring_grafana_data` + provisioned config in Git | `/local/borg-dumps`, `monitoring/grafana/provisioning`, `monitoring/grafana/dashboards` |
| Filebrowser | file-backed state dump + file data | `/local/borg-dumps`, `/local/appdata/filebrowser` |
| InfluxDB 3 Core | file data | `/local/appdata/influxdb3/data`, `/local/appdata/influxdb3/plugins` |
| n8n | SQLite dump + encrypted workflow/credential state | `/local/borg-dumps`, `/local/appdata/n8n/data` |
| Home Assistant | HA-native backup + file state | `/local/appdata/homeassistant`, `/local/services/smart-home-kalli` |
| Smart-Home MQTT / Mosquitto | file data | `/local/appdata/mosquitto/config`, `/local/appdata/mosquitto/data` |
| Zigbee2MQTT (planned) | file data + coordinator state | `/local/appdata/zigbee2mqtt`, `/local/services/smart-home-kalli` |
| ESPHome (planned) | Fachrepo + optional build/runtime cache | `/local/services/smart-home-kalli/esphome`, optional `/local/appdata/esphome` |
| Hermes Agent | file data + SSH key | SSH-Key via `/local/secrets`; `/local/appdata/hermes-agent/data` ist bewusst NICHT in `all-important-sources.txt`, weil der Stack geparkt ist (Review 2026-07-25). Beim Aktivieren des Stacks in die Quellliste aufnehmen. |
| Hermes Agent | file data + SSH key | `/local/appdata/hermes-agent/data`, `/local/secrets/hermes_runner_id_ed25519` |
| BentoPDF | rebuildable | no critical persistence in compose |
## Open Decisions and Coverage Gaps
@@ -72,17 +71,6 @@ Option A umgesetzt: `pre-backup-dumps.sh` writes `nextcloud.dump` from `nextclou
The live Unraid User Scripts execute repo scripts from `/mnt/user/services/homelab-infra`, while Komodo keeps stack workspaces below `/mnt/user/services/stacks`. These paths are now mounted into Borg UI as `/local/services/...` and included explicitly so host-side script hotfixes, stack workspace state, and posture-check state are recoverable.
### User-Daten-Shares ausserhalb des App-Scope
Filebrowser serviert `/mnt/user/projekte`, `/mnt/user/documents` und `/mnt/user/photos` komplett (`ops/filebrowser/docker-compose.yml`). Der Borg-Scope deckt aber bewusst nur die App-Unterordner ab (`documents/paperless*`, `documents/nextcloud-data`, `documents/scans_inbox`, `photos/immich`, `photos/family_archive`).
- **`/mnt/user/projekte`** ist aktuell in **keinem** Borg-Scope. Ad-hoc-Dateien, die direkt unter `documents/` oder `photos/` (ausserhalb der genannten App-Ordner) abgelegt werden, ebenfalls nicht.
- Entscheidung Operator offen (Eintrag in `docs/MASTER_TODO.md`): Entweder `projekte` als eigenen read-only Borg-UI-Mount + Quelllisten-Eintrag aufnehmen, oder bewusst als "nur lokal, nicht DR-relevant" bestaetigen. Bis zur Entscheidung gilt: dort liegende Originaldaten sind **nicht** wiederherstellbar.
### Komodo keys
Production still stores Komodo Core/Periphery keys in the Docker named volume `komodo_komodo_keys`. This is a known open migration item and is not fixed by the Borg source list alone. Target state: move the keys to a host path such as `/mnt/user/appdata/komodo/keys` and mount that path into both Komodo containers, then include it in Borg. Do not treat this as solved until the live Compose stack has been migrated and Periphery reconnect has been verified.
## Database Dumps Required
### Shared PostgreSQL (`postgresql17`, runtime PostgreSQL 18)
@@ -101,7 +89,6 @@ Production still stores Komodo Core/Periphery keys in the Docker named volume `k
- Komodo MongoDB
- SQLite: `gitea`, `vaultwarden`, `speedtest-tracker`, `borg-ui`, `grafana`
- SQLite: `n8n` (`n8n.sqlite.dump`, credentials require the matching `N8N_ENCRYPTION_KEY`)
- File-backed state: `filebrowser.bolt.dump`
- Unraid flash config: `unraid-flash-config.tar.gz` plus `unraid-flash-config.tar.gz.sha256`
- Home Assistant native backups: created by HA under `/mnt/user/appdata/homeassistant/backups` and captured as file state
-6
View File
@@ -18,12 +18,6 @@
/local/appdata/borg-ui/data
/local/appdata/komodo/periphery
/local/appdata/komodo/core
/local/appdata/nextcloud/html
/local/nextcloud/data
/local/appdata/n8n/data
/local/appdata/filebrowser
/local/appdata/influxdb3/data
/local/appdata/influxdb3/plugins
/local/services/homelab-infra
/local/services/smart-home-kalli
/local/services/stacks
-1
View File
@@ -325,7 +325,6 @@ main() {
# Additional host-side SQLite dumps for admin tooling with appdata files.
dump_sqlite_file "/mnt/user/appdata/borg-ui/data/borg.db" "$LATEST_DIR/borg-ui.sqlite" "borg-ui"
dump_sqlite_file "/var/lib/docker/volumes/monitoring_grafana_data/_data/grafana.db" "$LATEST_DIR/grafana.sqlite" "grafana"
dump_sqlite_file "/mnt/user/appdata/n8n/data/database.sqlite" "$LATEST_DIR/n8n.sqlite.dump" "n8n"
# MongoDB
dump_mongo_container "komodo-mongo" "$LATEST_DIR/komodo-mongo.archive.gz"
@@ -25,7 +25,6 @@ $Jobs = @(
"immich.dump",
"komodo-mongo.archive.gz",
"mealie.dump",
"n8n.sqlite.dump",
"nextcloud.dump",
"postgresql17-authelia.dump",
"postgresql17-globals.sql",
@@ -6,7 +6,6 @@ param(
)
$checks = @(
@{ Name = "postgresql17-globals.sql"; Path = Join-Path $DumpRoot "postgresql17-globals.sql" },
@{ Name = "postgresql17-paperless.dump"; Path = Join-Path $DumpRoot "postgresql17-paperless.dump" },
@{ Name = "postgresql17-mailarchiver.dump"; Path = Join-Path $DumpRoot "postgresql17-mailarchiver.dump" },
@{ Name = "mealie.dump"; Path = Join-Path $DumpRoot "mealie.dump" },
@@ -14,7 +13,6 @@ $checks = @(
@{ Name = "nextcloud.dump"; Path = Join-Path $DumpRoot "nextcloud.dump" },
@{ Name = "gitea.sqlite.dump"; Path = Join-Path $DumpRoot "gitea.sqlite.dump" },
@{ Name = "vaultwarden.sqlite.dump"; Path = Join-Path $DumpRoot "vaultwarden.sqlite.dump" },
@{ Name = "n8n.sqlite.dump"; Path = Join-Path $DumpRoot "n8n.sqlite.dump" },
@{ Name = "speedtest-tracker.sqlite.dump"; Path = Join-Path $DumpRoot "speedtest-tracker.sqlite.dump" },
@{ Name = "filebrowser.bolt.dump"; Path = Join-Path $DumpRoot "filebrowser.bolt.dump" },
@{ Name = "unraid-flash-config.tar.gz"; Path = Join-Path $DumpRoot "unraid-flash-config.tar.gz" }
@@ -89,7 +89,6 @@ check_pg_header() {
}
for dump in \
postgresql17-globals.sql \
postgresql17-paperless.dump \
postgresql17-mailarchiver.dump \
mealie.dump \
@@ -97,7 +96,6 @@ for dump in \
nextcloud.dump \
gitea.sqlite.dump \
vaultwarden.sqlite.dump \
n8n.sqlite.dump \
speedtest-tracker.sqlite.dump \
filebrowser.bolt.dump \
unraid-flash-config.tar.gz; do
@@ -4,11 +4,7 @@ set -euo pipefail
TEXTFILE_DIR="${TEXTFILE_DIR:-/mnt/user/services/posture-check/textfile}"
OUTPUT_FILE="${OUTPUT_FILE:-$TEXTFILE_DIR/homelab.prom}"
BORG_CONTAINER="${BORG_CONTAINER:-borg-ui}"
BORG_EXPECTED_SOURCES_FILE="${BORG_EXPECTED_SOURCES_FILE:-/local/services/homelab-infra/ops/borg-ui/all-important-sources.txt}"
# Host-Pfad der aktuellen Dump-Artefakte (pre-backup-dumps.sh schreibt hierhin).
# Wird host-seitig gestattet; der Exporter laeuft als Unraid User Script.
BORG_DUMP_DIR="${BORG_DUMP_DIR:-/mnt/user/backups/borg/dumps/latest}"
CRITICAL_CONTAINERS="${CRITICAL_CONTAINERS:-traefik authelia postgresql17 gitea komodo-core komodo-mongo komodo-periphery vaultwarden borg-ui ntfy adguard unbound monitoring-alertmanager monitoring-alertmanager-ntfy-bridge monitoring-blackbox-exporter monitoring-cadvisor monitoring-grafana monitoring-loki monitoring-node-exporter monitoring-promtail immich_server immich_postgres immich_redis paperless-ngx nextcloud nextcloud-postgres nextcloud-redis mealie mealie-postgres mail-archiver n8n homeassistant smarthome-mosquitto}"
CRITICAL_CONTAINERS="${CRITICAL_CONTAINERS:-traefik authelia postgresql17 gitea komodo-core komodo-mongo komodo-periphery vaultwarden borg-ui ntfy adguard unbound monitoring-alertmanager monitoring-alertmanager-ntfy-bridge monitoring-blackbox-exporter monitoring-cadvisor monitoring-grafana monitoring-loki monitoring-node-exporter monitoring-promtail immich_server immich_postgres immich_redis paperless-ngx nextcloud nextcloud-postgres nextcloud-redis mealie mealie-postgres}"
# Hinweis: Tailscale laeuft als natives Unraid-Plugin (kein Docker-Container) und
# wird daher hier bewusst NICHT als kritischer Container gefuehrt (Stand 2026-06-06).
@@ -94,32 +90,11 @@ EOF
# TYPE homelab_borg_last_success gauge
# HELP homelab_borg_last_job_warning Whether the most recent Borg backup job completed with warnings.
# TYPE homelab_borg_last_job_warning gauge
# HELP homelab_borg_repository_last_check_timestamp_seconds Unix timestamp of the latest Borg repository check known to Borg UI.
# TYPE homelab_borg_repository_last_check_timestamp_seconds gauge
# HELP homelab_borg_scope_expected_file_present Whether the expected Borg source list file is visible inside Borg UI.
# TYPE homelab_borg_scope_expected_file_present gauge
# HELP homelab_borg_scope_expected_sources_total Number of expected Borg source paths from the repo source list.
# TYPE homelab_borg_scope_expected_sources_total gauge
# HELP homelab_borg_scope_configured_sources_total Number of Borg source paths configured in Borg UI.
# TYPE homelab_borg_scope_configured_sources_total gauge
# HELP homelab_borg_scope_missing_sources_total Number of expected Borg source paths missing from Borg UI.
# TYPE homelab_borg_scope_missing_sources_total gauge
# HELP homelab_borg_scope_extra_sources_total Number of Borg UI source paths not present in the repo source list.
# TYPE homelab_borg_scope_extra_sources_total gauge
# HELP homelab_borg_scope_source_configured Whether an expected Borg source path is configured in Borg UI.
# TYPE homelab_borg_scope_source_configured gauge
# HELP homelab_borg_schedule_prune_after_enabled Whether a Borg scheduled job runs prune after backup.
# TYPE homelab_borg_schedule_prune_after_enabled gauge
# HELP homelab_borg_schedule_compact_after_enabled Whether a Borg scheduled job runs compact after backup.
# TYPE homelab_borg_schedule_compact_after_enabled gauge
EOF
if docker inspect "$BORG_CONTAINER" >/dev/null 2>&1; then
docker exec -i -e BORG_EXPECTED_SOURCES_FILE="$BORG_EXPECTED_SOURCES_FILE" "$BORG_CONTAINER" python3 - <<'PY'
docker exec -i "$BORG_CONTAINER" python3 - <<'PY'
import datetime as dt
import json
import os
from pathlib import Path
import sqlite3
conn = sqlite3.connect("/data/borg.db")
@@ -160,9 +135,6 @@ def parse_ts(value):
def escape_label(value):
return (value or "").replace("\\", "\\\\").replace('"', '\\"')
def bool_metric(value):
return 1 if value else 0
latest_status = latest["status"] if latest else "missing"
latest_success = 1 if latest_status in ("completed", "completed_with_warnings") else 0
latest_warning = 1 if latest_status == "completed_with_warnings" else 0
@@ -173,107 +145,12 @@ completed_archive = escape_label(completed["archive_name"] if completed else "")
print(f'homelab_borg_last_success{{status="{latest_status}",archive="{latest_archive}"}} {latest_success}')
print(f'homelab_borg_last_job_warning{{status="{latest_status}",archive="{latest_archive}"}} {latest_warning}')
print(f'homelab_borg_last_completed_timestamp_seconds{{archive="{completed_archive}"}} {completed_ts}')
repo = cur.execute("""
select id, name, source_directories, last_check
from repositories
order by id
limit 1
""").fetchone()
if repo:
repo_name = escape_label(repo["name"] or str(repo["id"]))
print(f'homelab_borg_repository_last_check_timestamp_seconds{{repository="{repo_name}"}} {parse_ts(repo["last_check"])}')
try:
configured_sources = json.loads(repo["source_directories"] or "[]")
except json.JSONDecodeError:
configured_sources = []
else:
configured_sources = []
expected_path = Path(os.environ.get("BORG_EXPECTED_SOURCES_FILE", ""))
expected_file_present = expected_path.is_file()
if expected_file_present:
expected_sources = [
line.strip()
for line in expected_path.read_text(encoding="utf-8").splitlines()
if line.strip() and not line.lstrip().startswith("#")
]
else:
expected_sources = []
configured_set = set(configured_sources)
expected_set = set(expected_sources)
missing_sources = [source for source in expected_sources if source not in configured_set]
extra_sources = [source for source in configured_sources if source not in expected_set]
print(f"homelab_borg_scope_expected_file_present {bool_metric(expected_file_present)}")
print(f"homelab_borg_scope_expected_sources_total {len(expected_sources)}")
print(f"homelab_borg_scope_configured_sources_total {len(configured_sources)}")
print(f"homelab_borg_scope_missing_sources_total {len(missing_sources)}")
print(f"homelab_borg_scope_extra_sources_total {len(extra_sources)}")
for source in expected_sources:
value = 1 if source in configured_set else 0
print(f'homelab_borg_scope_source_configured{{source="{escape_label(source)}"}} {value}')
for source in extra_sources:
print(f'homelab_borg_scope_source_configured{{source="{escape_label(source)}",state="extra"}} 0')
for schedule in cur.execute("""
select id, name, run_prune_after, run_compact_after
from scheduled_jobs
where enabled = 1
order by id
"""):
schedule_name = escape_label(schedule["name"] or str(schedule["id"]))
print(f'homelab_borg_schedule_prune_after_enabled{{schedule="{schedule_name}"}} {bool_metric(schedule["run_prune_after"])}')
print(f'homelab_borg_schedule_compact_after_enabled{{schedule="{schedule_name}"}} {bool_metric(schedule["run_compact_after"])}')
PY
else
printf 'homelab_borg_last_success{status="container_missing",archive=""} 0\n'
printf 'homelab_borg_last_job_warning{status="container_missing",archive=""} 0\n'
printf 'homelab_borg_last_completed_timestamp_seconds{archive=""} 0\n'
printf 'homelab_borg_repository_last_check_timestamp_seconds{repository=""} 0\n'
printf 'homelab_borg_scope_expected_file_present 0\n'
printf 'homelab_borg_scope_expected_sources_total 0\n'
printf 'homelab_borg_scope_configured_sources_total 0\n'
printf 'homelab_borg_scope_missing_sources_total 0\n'
printf 'homelab_borg_scope_extra_sources_total 0\n'
fi
# Dump-Frische host-seitig messen. Schliesst den Blindfleck, dass Borg
# weiterlaeuft und stale Dumps archiviert, ohne dass ein Job-Fehler entsteht
# (pre-backup-dumps.sh gestoppt). Laeuft ausserhalb des borg-ui-Containers,
# weil die Dumps host-seitig unter $BORG_DUMP_DIR liegen.
cat <<'EOF'
# HELP homelab_borg_dump_present Whether an expected Borg pre-backup dump artifact exists in the latest dump set.
# TYPE homelab_borg_dump_present gauge
# HELP homelab_borg_dump_age_seconds Age in seconds of an expected Borg pre-backup dump artifact.
# TYPE homelab_borg_dump_age_seconds gauge
EOF
for dump in \
postgresql17-globals.sql \
postgresql17-mailarchiver.dump \
postgresql17-paperless.dump \
mealie.dump \
immich.dump \
nextcloud.dump \
gitea.sqlite.dump \
vaultwarden.sqlite.dump \
n8n.sqlite.dump \
unraid-flash-config.tar.gz \
komodo-mongo.archive.gz; do
dump_path="$BORG_DUMP_DIR/$dump"
if [ -f "$dump_path" ]; then
dump_mtime="$(stat -c %Y "$dump_path" 2>/dev/null || echo 0)"
printf 'homelab_borg_dump_present{dump="%s"} 1\n' "$dump"
printf 'homelab_borg_dump_age_seconds{dump="%s"} %s\n' "$dump" "$(( now - dump_mtime ))"
else
printf 'homelab_borg_dump_present{dump="%s"} 0\n' "$dump"
fi
done
} > "$tmp"
# 0644 statt mktemp-default 0600, damit der node-exporter-Textfile-Collector