Close Gitea signup, dedup posture-check alerts, extend Borg scope
Operational hardening across several services after live incident analysis between 2026-05-18 and 2026-05-20: - Gitea: disable public registration and OpenID signup/signin to stop the external POST / 5xx bursts that triggered availability alerts. New repo-wide policy requires every productive Micha/homelab-infra Komodo stack to ship with an active Gitea->Komodo webhook on the current stack ID (documented in CLAUDE.md, AI_CONTEXT.md, WORKFLOW.md). - posture-check: extract the Disk1 fstype check into its own function so the documented Disk1 NTFS exception no longer raises ntfy warnings, skip POSIX inode checks on NTFS, and dedup ntfy alerts via a fingerprint state file with ALERT_REPEAT_SECONDS (default 24h). Repeat-spam on the same cause now suppressed. - docker-critical-events: parse the event JSON for container name, action, exit code and signal; drop `die exit=0` events (clean stops); ship a structured ntfy message instead of the raw event line. - Borg UI: mount /mnt/user/services into the backup container as /local/services:ro and include homelab-infra, stacks and posture-check in all-important-sources.txt. RESTORE_MATRIX and DISASTER_RECOVERY updated accordingly. - Unraid user scripts: document the new homelab-operations-report-daily cron job and the SMTP password file it expects on the host. - MIGRATION_LOG: capture the four live events from this window - Gitea 5xx burst + signup closure, Komodo webhook reconciliation, posture-check host-version verification, Borg scope extension, and Traefik 5xx alert detuning. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -69,6 +69,8 @@ Standard-Workflow:
|
||||
7. Komodo-Deploy/Runtime pruefen
|
||||
8. Dokumentation nachziehen
|
||||
|
||||
Neue produktive Komodo-Stacks aus `Micha/homelab-infra` brauchen verpflichtend einen aktiven Gitea->Komodo-Webhook auf die aktuelle Stack-ID. Ausnahmen muessen im selben Aenderungsblock dokumentiert werden.
|
||||
|
||||
Wenn Drift vermutet wird, nicht raten. Erst die Pflichtmatrix in `docs/GITOPS_DRIFT_RUNBOOK.md` abarbeiten.
|
||||
|
||||
## Sicherheitsregeln
|
||||
|
||||
@@ -11,6 +11,10 @@ services:
|
||||
- GITEA__server__DOMAIN=git.kaleschke.info
|
||||
- GITEA__server__ROOT_URL=https://git.kaleschke.info/
|
||||
- GITEA__database__DB_TYPE=sqlite3
|
||||
- GITEA__service__DISABLE_REGISTRATION=true
|
||||
- GITEA__service__REGISTER_EMAIL_CONFIRM=true
|
||||
- GITEA__openid__ENABLE_OPENID_SIGNIN=false
|
||||
- GITEA__openid__ENABLE_OPENID_SIGNUP=false
|
||||
- GITEA__webhook__ALLOWED_HOST_LIST=komodo-core,localhost,127.0.0.1,192.168.178.0/24
|
||||
volumes:
|
||||
- /mnt/user/services/gitea/data:/data
|
||||
|
||||
@@ -39,6 +39,7 @@ Traefik ist der zentrale Web-Einstieg fuer HTTP(S). Admin-/Ops-UIs liegen entwed
|
||||
- Gitea hostet das Repo unter `git.kaleschke.info`.
|
||||
- Komodo ist Stack-Manager und Deploy-Consumer.
|
||||
- Komodo Periphery braucht Docker-Socket und `/mnt/user/services` Mount, um Stacks reproduzierbar zu deployen.
|
||||
- Neue produktive Komodo-Stacks aus `Micha/homelab-infra` muessen einen aktiven Gitea->Komodo-Webhook auf die aktuelle Stack-ID haben; Ausnahmen wie deaktivierte/pausierte Stacks muessen dokumentiert werden.
|
||||
|
||||
### Identity / Security
|
||||
|
||||
@@ -101,6 +102,8 @@ Normalfall:
|
||||
|
||||
Wichtig: Komodo-Web-Editor ist nicht der Bearbeitungsort. Wenn Komodo und Git voneinander abweichen, zuerst Git und Komodo Workspace pruefen, nicht live herumprobieren.
|
||||
|
||||
Beim Anlegen neuer produktiver Stacks ist der Gitea->Komodo-Webhook Pflicht. Nach dem Anlegen muss ein Test-Push oder Test-Delivery zeigen, dass Gitea die aktuelle Komodo-Stack-ID erreicht.
|
||||
|
||||
## Netzwerkmodell
|
||||
|
||||
| Netzwerk | Bedeutung |
|
||||
|
||||
@@ -200,6 +200,9 @@ Besonders kritisch:
|
||||
|
||||
- `/mnt/user/appdata/secrets`
|
||||
- `/mnt/user/appdata/traefik`
|
||||
- `/mnt/user/services/homelab-infra`
|
||||
- `/mnt/user/services/stacks`
|
||||
- `/mnt/user/services/posture-check`
|
||||
- `/mnt/user/services/gitea/data`
|
||||
- `/mnt/user/appdata/authelia/config`
|
||||
- `/mnt/user/appdata/komodo/core`
|
||||
|
||||
@@ -16,6 +16,42 @@ Dieses Dokument ist nur noch ein historischer Verlauf. Der aktuelle operative Ab
|
||||
|
||||
## Historische Meilensteine
|
||||
|
||||
### 2026-05-20 - Gitea 5xx-Bursts untersucht und Signup geschlossen
|
||||
|
||||
- Live-Befund zu `HomelabTraefik5xx`: kurze externe `POST /`-Bursts auf `gitea@docker` von `103.153.183.69` und `103.153.183.73`, jeweils HTTP 500 in unter 10 ms; normale Gitea-Checks und Git-Reads liefen parallel mit HTTP 200.
|
||||
- Keine Hinweise auf erfolgreichen Zugriff: Gitea-Container ohne Restart/OOM, nur User `micha`, keine neuen User der letzten 30 Tage, keine neuen Repos, SSH-Keys oder Access-Tokens im Untersuchungsfenster.
|
||||
- Live-Prometheus lief noch mit der alten Regel `rate(...[5m]) > 0`; die bereits im Repo vorbereitete Regel `increase(...[5m]) >= 5` wurde auf den Live-Mount kopiert und per Prometheus-Reload aktiviert.
|
||||
- Gitea-Registrierung und OpenID-Signup wurden geschlossen: `DISABLE_REGISTRATION=true`, `REGISTER_EMAIL_CONFIRM=true`, `ENABLE_OPENID_SIGNIN=false`, `ENABLE_OPENID_SIGNUP=false`; Signup-Seite zeigt danach "Registration is disabled", OpenID-Login liefert 403.
|
||||
|
||||
### 2026-05-18 - Komodo Webhooks vollstaendig abgeglichen
|
||||
|
||||
- Live-Befund auf `Kallilabcore`: Komodo hatte fuer mehrere aktuelle Stacks `webhook_enabled: true`, aber Gitea enthielt noch nicht fuer alle aktuellen Stack-IDs aktive Webhooks.
|
||||
- In der Gitea-Datenbank wurden aktive Webhooks fuer `monitoring` (`6a08d5297707b0930ab95c72`), `glance` (`6a09d7347707b0930ab96eae`), `grafana` (`69f31ecdf65eb72b757c497d`) und `nextcloud` (`69e519085fd5e8bc51f121f0`) nach dem bestehenden Komodo-Hook-Muster angelegt.
|
||||
- Stale aktive Gitea-Hooks auf nicht mehr vorhandene bzw. alte Komodo-Stack-IDs wurden deaktiviert.
|
||||
- Abgleich danach: 30 aktive Gitea-Komodo-Hooks fuer 30 Komodo-Stacks mit aktiviertem Webhook; `hermes` bleibt in Komodo bewusst `webhook_enabled: false`.
|
||||
- Netzwerkpfad aus dem `gitea`-Container zu `komodo-core:9120` wurde erfolgreich verifiziert; `last_status=0` fuer neue Hooks bleibt bis zum ersten Push erwartbar.
|
||||
|
||||
### 2026-05-19 - Posture-Check Host-Version verifiziert
|
||||
|
||||
- Ursache fuer wiederholte ntfy-Warnings war nicht mehr die Repo-Logik allein, sondern dass auf dem Unraid-Host noch die alte Skriptversion unter `/mnt/user/services/homelab-infra/services/posture-check/posture-check.sh` ausgefuehrt wurde.
|
||||
- Host-Skript wurde mit Backup ersetzt und mit `SEND_NTFY=0` direkt auf dem Host verifiziert.
|
||||
- Ergebnis des echten Host-Laufs: `status: ok`, `critical_count: 0`, `warning_count: 0`.
|
||||
- Betriebsregel daraus: Bei Host-User-Scripts nach Repo-Aenderungen immer den tatsaechlich ausgefuehrten Host-Pfad und den Live-Output pruefen.
|
||||
|
||||
### 2026-05-19 - Borg-Scope fuer GitOps Host Automation erweitert
|
||||
|
||||
- Nach den Gitea-/Komodo-Webhook- und Posture-Check-Aenderungen wurde der Backup-Scope um Host-GitOps-Pfade erweitert.
|
||||
- Borg UI mountet kuenftig `/mnt/user/services` read-only als `/local/services`.
|
||||
- In `all-important-sources.txt` wurden `/local/services/homelab-infra`, `/local/services/stacks` und `/local/services/posture-check` aufgenommen.
|
||||
- `pre-backup-dumps.sh` wurde auf dem Host ausgefuehrt; frische Dumps fuer `gitea.sqlite.dump` und `komodo-mongo.archive.gz` liegen unter `/mnt/user/backups/borg/dumps/latest`.
|
||||
- Wirksam wird der neue `/local/services`-Mount nach Redeploy/Recreate des `borg-ui`-Stacks.
|
||||
|
||||
### 2026-05-19 - Traefik-5xx Alert entstoert
|
||||
|
||||
- `HomelabTraefik5xx` hatte auf einzelne 5xx-Antworten reagiert, weil die Regel `rate(...[5m]) > 0` nutzte.
|
||||
- Live-Befund fuer `gitea@docker`: zwei kurze `POST /` mit HTTP 500 von einer externen IP, danach durchgehend erfolgreiche Gitea-Checks; kein Container-Restart.
|
||||
- Prometheus-Regel auf `increase(...[5m]) >= 5` geaendert, damit einzelne externe Fehlrequests keinen ntfy-Alarm ausloesen.
|
||||
|
||||
### 2026-05-17 - Glance Homelab-Dashboard vorbereitet
|
||||
|
||||
- `ops/glance` als geschuetztes Homelab-Dashboard unter `glance.kaleschke.info` vorbereitet.
|
||||
|
||||
@@ -27,7 +27,7 @@ Diese Datei ersetzt die alte Sprint-Liste vom 2026-05-16. Die damaligen Backup-,
|
||||
- Grafana-HA-/Wetter-Dashboard in `monitoring-grafana` aufbauen
|
||||
- Disk1-NTFS-Migration Phase 2:
|
||||
- bleibt bewusst separates Migrationsfenster
|
||||
- `posture-check` darf bis dahin die dokumentierte NTFS-Warnung melden
|
||||
- `posture-check` akzeptiert die dokumentierte NTFS-Ausnahme bis dahin ohne ntfy-Warnspam
|
||||
- Hermes VM-Seite:
|
||||
- Runner-VM, echte `.env`, SSH-Key und Dashboard/Gateway final zusammenfuehren
|
||||
- NAS-Stack erst starten, wenn VM-Seite bereit ist
|
||||
|
||||
@@ -33,7 +33,8 @@ Sie ist die fachliche Ergaenzung zu `docs/DISASTER_RECOVERY.md`.
|
||||
| Redis | Share / Host | `/mnt/user/appdata/redis` | keine | `redis_password.txt` | `backend_net` | Redis startet, Apps verbinden sich |
|
||||
| Authelia | Borg | `/mnt/user/appdata/authelia/config`, `/mnt/user/appdata/secrets/*authelia*` | Shared PostgreSQL, optional Dump `postgresql17-authelia.dump` | JWT/Session/Storage/Postgres-/SMTP-Secret-Dateien | PostgreSQL 17, Traefik, GMX SMTP | Login-Seite und ForwardAuth funktionieren; SMTP-Notifier startet; aktive Sessions werden nach Restart neu aufgebaut |
|
||||
| Gitea | Borg + Dump | `/mnt/user/services/gitea/data` | `gitea.sqlite.dump` | `borg_repo_passphrase.txt` fuer Restore-Tests | Traefik | Web-UI erreichbar, Repo sichtbar, SSH-Port reagiert; Mini-Restore nach `/mnt/user/backups/restore-lab/gitea` am 2026-05-07 erfolgreich validiert |
|
||||
| Komodo | Borg / Share | `/mnt/user/appdata/komodo/core`, `/mnt/user/appdata/komodo/periphery` | `komodo-mongo.archive.gz` falls verifiziert | `komodo_mongo_password.txt`, `KOMODO_*` Stack ENV | Traefik, Mongo, Gitea | UI erreichbar, Periphery verbunden |
|
||||
| Komodo | Borg / Share | `/mnt/user/appdata/komodo/core`, `/mnt/user/appdata/komodo/periphery`, `/mnt/user/services/stacks` | `komodo-mongo.archive.gz` falls verifiziert | `komodo_mongo_password.txt`, `KOMODO_*` Stack ENV | Traefik, Mongo, Gitea | UI erreichbar, Periphery verbunden |
|
||||
| GitOps Host Automation | Borg / Git | `/mnt/user/services/homelab-infra`, `/mnt/user/services/posture-check` | keine eigene DB | keine | Gitea, Komodo, Unraid User Scripts | `posture-check` laeuft vom Host-Pfad und liefert `warning_count: 0` im bekannten Uebergangszustand |
|
||||
| Vaultwarden | Borg + Dump | `/mnt/user/appdata/vaultwarden` | `vaultwarden.sqlite.dump` | `vaultwarden_admin_token.txt`, `borg_repo_passphrase.txt` fuer Restore-Tests | Traefik | Login-Seite erreichbar, Tresor-Daten sichtbar; Mini-Restore nach `/mnt/user/backups/restore-lab/vaultwarden` am 2026-05-07 erfolgreich validiert |
|
||||
|
||||
---
|
||||
|
||||
@@ -87,7 +87,7 @@ Secret-Werte sind nicht enthalten. Es werden nur Secret-Namen, Env-Key-Namen und
|
||||
|
||||
| Service | Zweck | Autoritativer Pfad | URL / Zugang | Abhaengigkeiten | Datenpfade | Backup / Restore | Traefik | Besonderheiten / TODOs |
|
||||
|---|---|---|---|---|---|---|---|---|
|
||||
| `posture-check` | Host-Posture-Audit fuer Filesystem, Mover-Drift, NVMe-SMART und Fuellstand | `services/posture-check/posture-check.sh` | Unraid User-Script / Cron / Borg Pre-Hook | `findmnt`, `df`, `nvme`, optional `curl` fuer ntfy | `/mnt/user/services/posture-check/last.json` | Repo-Skript + letzter JSON-Status | nein | Muss auf dem Unraid-Host bei Boot, stuendlich und vor Borg laufen; `ALLOW_DISK1_NTFS=1` ist die dokumentierte Uebergangsausnahme bis Disk1-Migration Phase 2; Warning/Critical alarmieren via ntfy |
|
||||
| `posture-check` | Host-Posture-Audit fuer Filesystem, Mover-Drift, NVMe-SMART und Fuellstand | `services/posture-check/posture-check.sh` | Unraid User-Script / Cron / Borg Pre-Hook | `findmnt`, `df`, `nvme`, optional `curl` fuer ntfy | `/mnt/user/services/posture-check/last.json` | Repo-Skript + letzter JSON-Status | nein | Muss auf dem Unraid-Host bei Boot, stuendlich und vor Borg laufen; `ALLOW_DISK1_NTFS=1` ist die dokumentierte Uebergangsausnahme bis Disk1-Migration Phase 2 und erzeugt fuer `ntfs3`/`fuseblk` keine ntfy-Warnung; Warning/Critical alarmieren via ntfy nur bei neuer Ursache oder nach `ALERT_REPEAT_SECONDS` |
|
||||
| `docker-critical-events` | Live-Alarmierung fuer Docker `die`/`oom`/`kill` Events | `services/posture-check/docker-critical-events.sh` | Unraid User-Script / Hintergrundprozess | Docker CLI, ntfy | `/mnt/user/services/posture-check/docker-critical-events-last.log` | Repo-Skript + letzter Event-Log | nein | Optional als Unraid User-Script `at array start` starten; sendet nach `homelab-alerts` |
|
||||
|
||||
## Backup- und Restore-Hinweise
|
||||
|
||||
@@ -115,6 +115,23 @@ Komodo ist in diesem Setup:
|
||||
- Pushes koennen automatisch einen Komodo-Deploy ausloesen
|
||||
- wenn Komodo und Git voneinander abweichen, gewinnt Git
|
||||
|
||||
### Pflicht bei neuen Komodo-Stacks
|
||||
|
||||
Jeder neue produktive Komodo-Stack, der aus `Micha/homelab-infra` deployed wird, braucht einen aktiven Gitea-Webhook auf die aktuelle Komodo-Stack-ID.
|
||||
|
||||
Pflichtschritte beim Anlegen:
|
||||
|
||||
1. Stack in Komodo aus Gitea anlegen
|
||||
2. `webhook_enabled` in Komodo aktivieren
|
||||
3. passenden Gitea-Webhook fuer die aktuelle Stack-ID anlegen
|
||||
4. Gitea-Hook gegen `http://komodo-core:9120/listener/github/stack/<stack-id>/deploy` pruefen
|
||||
5. einen Push oder Test-Delivery ausloesen und `last_status`/Komodo-Deploy pruefen
|
||||
6. Ausnahmen explizit dokumentieren
|
||||
|
||||
**Regel:** Kein neuer produktiver GitOps-Stack ohne funktionierenden Gitea->Komodo-Webhook. Bewusste Ausnahmen muessen im selben Aenderungsblock dokumentiert werden, inklusive Grund und Alternativ-Deploy-Weg.
|
||||
|
||||
Der Standardfall nutzt den globalen `KOMODO_WEBHOOK_SECRET` aus der Komodo-Host-`.env`, ausser Komodo zeigt fuer den Stack explizit ein eigenes per-Stack-Secret.
|
||||
|
||||
### Ausnahme: Komodo-Zugangsmodell
|
||||
|
||||
Komodo bleibt **bewusst** ohne zentrale Traefik-ForwardAuth-Middleware.
|
||||
|
||||
@@ -41,6 +41,7 @@ The inclusion of `/local/secrets` is intentional: Borg is expected to cover disa
|
||||
| AdGuard | config only | `/local/appdata/adguard/conf` |
|
||||
| Borg UI | SQLite dump + self-backup | `/local/borg-dumps`, `/local/appdata/borg-ui/data` |
|
||||
| Komodo | config + Mongo dump | `/local/borg-dumps`, `/local/appdata/komodo/periphery`, `/local/appdata/komodo/core` |
|
||||
| GitOps host automation | repo clone + Komodo workspaces + host-check state | `/local/services/homelab-infra`, `/local/services/stacks`, `/local/services/posture-check` |
|
||||
| Nextcloud | DB dump + file data | `/local/borg-dumps`, `/local/appdata/nextcloud/html`, `/local/nextcloud/data` |
|
||||
| Grafana | SQLite dump + file data | `/local/borg-dumps`, `/local/appdata/grafana` |
|
||||
| Filebrowser | file-backed state dump + file data | `/local/borg-dumps`, `/local/appdata/filebrowser` |
|
||||
@@ -60,6 +61,10 @@ Option A umgesetzt: `pre-backup-dumps.sh` writes `nextcloud.dump` from `nextclou
|
||||
|
||||
`komodo-mongo.archive.gz` was produced and verified on 2026-05-04 (`gzip -t` ok). The dump function is in place in `pre-backup-dumps.sh`. Re-verify after any Komodo or Mongo major upgrade.
|
||||
|
||||
### GitOps host automation
|
||||
|
||||
The live Unraid User Scripts execute repo scripts from `/mnt/user/services/homelab-infra`, while Komodo keeps stack workspaces below `/mnt/user/services/stacks`. These paths are now mounted into Borg UI as `/local/services/...` and included explicitly so host-side script hotfixes, stack workspace state, and posture-check state are recoverable.
|
||||
|
||||
## Database Dumps Required
|
||||
|
||||
### Shared PostgreSQL (`postgresql17`)
|
||||
|
||||
@@ -20,3 +20,6 @@
|
||||
/local/appdata/borg-ui/data
|
||||
/local/appdata/komodo/periphery
|
||||
/local/appdata/komodo/core
|
||||
/local/services/homelab-infra
|
||||
/local/services/stacks
|
||||
/local/services/posture-check
|
||||
|
||||
@@ -23,6 +23,7 @@ services:
|
||||
- /mnt/user/documents/nextcloud-data:/local/nextcloud/data:ro
|
||||
- /mnt/user/photos/immich:/local/immich/upload:ro
|
||||
- /mnt/user/photos/family_archive:/local/immich/external:ro
|
||||
- /mnt/user/services:/local/services:ro
|
||||
- /mnt/user/services/gitea/data:/local/gitea/data:ro
|
||||
- /mnt/user/appdata/borg-ui/restore:/restore
|
||||
dns:
|
||||
|
||||
@@ -9,15 +9,79 @@ EVENT_FILTERS="${EVENT_FILTERS:---filter event=die --filter event=oom --filter e
|
||||
|
||||
mkdir -p "$(dirname "$OUTPUT_PATH")"
|
||||
|
||||
json_value() {
|
||||
local key="$1"
|
||||
local json="$2"
|
||||
|
||||
printf '%s' "$json" | sed -n "s/.*\"$key\":\"\\([^\"]*\\)\".*/\\1/p" | head -n 1
|
||||
}
|
||||
|
||||
event_summary() {
|
||||
local event="$1"
|
||||
local action name image exit_code signal
|
||||
|
||||
action="$(json_value "Action" "$event")"
|
||||
name="$(json_value "name" "$event")"
|
||||
image="$(json_value "image" "$event")"
|
||||
exit_code="$(json_value "exitCode" "$event")"
|
||||
signal="$(json_value "signal" "$event")"
|
||||
|
||||
printf 'Container: %s\nAction: %s\nImage: %s\nExit-Code: %s\nSignal: %s\n\nFull event logged in: %s\n' \
|
||||
"${name:-unknown}" \
|
||||
"${action:-unknown}" \
|
||||
"${image:-unknown}" \
|
||||
"${exit_code:-n/a}" \
|
||||
"${signal:-n/a}" \
|
||||
"$OUTPUT_PATH"
|
||||
}
|
||||
|
||||
event_title() {
|
||||
local event="$1"
|
||||
local action name exit_code
|
||||
|
||||
action="$(json_value "Action" "$event")"
|
||||
name="$(json_value "name" "$event")"
|
||||
exit_code="$(json_value "exitCode" "$event")"
|
||||
|
||||
if [ -n "$exit_code" ]; then
|
||||
printf 'Docker critical: %s %s exit=%s' "${name:-unknown}" "${action:-event}" "$exit_code"
|
||||
else
|
||||
printf 'Docker critical: %s %s' "${name:-unknown}" "${action:-event}"
|
||||
fi
|
||||
}
|
||||
|
||||
should_send_event() {
|
||||
local event="$1"
|
||||
local action exit_code
|
||||
|
||||
action="$(json_value "Action" "$event")"
|
||||
exit_code="$(json_value "exitCode" "$event")"
|
||||
|
||||
case "$action" in
|
||||
die)
|
||||
[ "${exit_code:-}" != "0" ]
|
||||
;;
|
||||
oom|kill)
|
||||
return 0
|
||||
;;
|
||||
*)
|
||||
return 1
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
send_event() {
|
||||
local line="$1"
|
||||
local title message
|
||||
local timestamp
|
||||
timestamp="$(date -Iseconds)"
|
||||
title="$(event_title "$line")"
|
||||
message="$(event_summary "$line")"
|
||||
|
||||
printf '%s %s\n' "$timestamp" "$line" | tee -a "$OUTPUT_PATH" >/dev/null
|
||||
|
||||
if [ "$SEND_NTFY" = "1" ] && [ -f "$NTFY_SCRIPT" ]; then
|
||||
bash "$NTFY_SCRIPT" "$NTFY_TOPIC" "Docker critical event" "$line" high || true
|
||||
bash "$NTFY_SCRIPT" "$NTFY_TOPIC" "$title" "$message" high || true
|
||||
fi
|
||||
}
|
||||
|
||||
@@ -29,5 +93,6 @@ fi
|
||||
# shellcheck disable=SC2086
|
||||
docker events $EVENT_FILTERS --format '{{json .}}' | while IFS= read -r event; do
|
||||
[ -n "$event" ] || continue
|
||||
should_send_event "$event" || continue
|
||||
send_event "$event"
|
||||
done
|
||||
|
||||
@@ -8,6 +8,8 @@ CRITICAL_TOPIC="${CRITICAL_TOPIC:-homelab-alerts}"
|
||||
SEND_NTFY="${SEND_NTFY:-1}"
|
||||
TMP_DIR="${TMP_DIR:-/tmp/kallilab-posture-check}"
|
||||
ALLOW_DISK1_NTFS="${ALLOW_DISK1_NTFS:-1}"
|
||||
ALERT_STATE_PATH="${ALERT_STATE_PATH:-/mnt/user/services/posture-check/last-alert.state}"
|
||||
ALERT_REPEAT_SECONDS="${ALERT_REPEAT_SECONDS:-86400}"
|
||||
|
||||
mkdir -p "$TMP_DIR"
|
||||
RESULTS_FILE="$TMP_DIR/results.$$"
|
||||
@@ -63,6 +65,34 @@ check_fstype() {
|
||||
fi
|
||||
}
|
||||
|
||||
check_disk1_fstype() {
|
||||
local actual
|
||||
|
||||
if ! command -v findmnt >/dev/null 2>&1; then
|
||||
add_result "warning" "disk1_fstype" "Cannot check /mnt/disk1 filesystem because findmnt is missing"
|
||||
return
|
||||
fi
|
||||
|
||||
if ! actual="$(findmnt -no FSTYPE "/mnt/disk1" 2>/dev/null)"; then
|
||||
add_result "warning" "disk1_fstype" "Mount not found: /mnt/disk1"
|
||||
return
|
||||
fi
|
||||
|
||||
if [ "$ALLOW_DISK1_NTFS" = "1" ]; then
|
||||
if [ "$actual" = "ntfs3" ] || [ "$actual" = "fuseblk" ]; then
|
||||
add_result "ok" "disk1_fstype" "/mnt/disk1 filesystem is $actual; temporarily allowed until Disk1 phase 2 migration"
|
||||
else
|
||||
add_result "warning" "disk1_fstype" "/mnt/disk1 filesystem is $actual, expected ntfs3/fuseblk during temporary Disk1 migration exception"
|
||||
fi
|
||||
else
|
||||
if [ "$actual" = "xfs" ]; then
|
||||
add_result "ok" "disk1_fstype" "/mnt/disk1 filesystem is $actual"
|
||||
else
|
||||
add_result "critical" "disk1_fstype" "/mnt/disk1 filesystem is $actual, expected xfs"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
check_no_ntfs_on_core_mounts() {
|
||||
local hits
|
||||
local pattern="^/mnt/(cache|disk1)(/|$)"
|
||||
@@ -80,7 +110,7 @@ check_no_ntfs_on_core_mounts() {
|
||||
if [ -n "$hits" ]; then
|
||||
add_result "critical" "no_ntfs_core_mounts" "NTFS-like filesystem on core mount: $hits"
|
||||
elif [ "$ALLOW_DISK1_NTFS" = "1" ]; then
|
||||
add_result "warning" "no_ntfs_core_mounts" "No NTFS on /mnt/cache; /mnt/disk1 NTFS is temporarily allowed until Disk1 phase 2 migration"
|
||||
add_result "ok" "no_ntfs_core_mounts" "No NTFS on /mnt/cache; /mnt/disk1 NTFS is temporarily allowed until Disk1 phase 2 migration"
|
||||
else
|
||||
add_result "ok" "no_ntfs_core_mounts" "No ntfs3/fuseblk mounts below /mnt/cache or /mnt/disk1"
|
||||
fi
|
||||
@@ -122,6 +152,15 @@ check_inode_usage() {
|
||||
fi
|
||||
}
|
||||
|
||||
check_disk1_inode_usage() {
|
||||
if [ "$ALLOW_DISK1_NTFS" = "1" ]; then
|
||||
add_result "ok" "disk1_inode_usage" "/mnt/disk1 inode usage skipped; NTFS transition filesystem does not expose POSIX inode usage"
|
||||
return
|
||||
fi
|
||||
|
||||
check_inode_usage "/mnt/disk1" 80 "disk1_inode_usage"
|
||||
}
|
||||
|
||||
check_filesystem_usage() {
|
||||
local path="$1"
|
||||
local max_percent="$2"
|
||||
@@ -198,6 +237,79 @@ send_ntfy() {
|
||||
fi
|
||||
}
|
||||
|
||||
alert_fingerprint() {
|
||||
awk -F '\t' '$1 != "ok" { printf "%s|%s|%s\n", $1, $2, $3 }' "$RESULTS_FILE" | cksum | awk '{ print $1 ":" $2 }'
|
||||
}
|
||||
|
||||
alert_summary() {
|
||||
awk -F '\t' '$1 != "ok" { printf "%s:%s; ", $1, $2 }' "$RESULTS_FILE" | sed 's/; $//'
|
||||
}
|
||||
|
||||
should_send_alert() {
|
||||
local fingerprint="$1"
|
||||
local now
|
||||
local last_fingerprint=""
|
||||
local last_sent="0"
|
||||
|
||||
now="$(date +%s)"
|
||||
|
||||
if ! printf '%s' "$ALERT_REPEAT_SECONDS" | grep -Eq '^[0-9]+$'; then
|
||||
ALERT_REPEAT_SECONDS=86400
|
||||
fi
|
||||
|
||||
if [ -f "$ALERT_STATE_PATH" ]; then
|
||||
IFS="$(printf '\t')" read -r last_fingerprint last_sent < "$ALERT_STATE_PATH" || true
|
||||
fi
|
||||
|
||||
if [ "$fingerprint" != "$last_fingerprint" ]; then
|
||||
return 0
|
||||
fi
|
||||
|
||||
if ! printf '%s' "$last_sent" | grep -Eq '^[0-9]+$'; then
|
||||
return 0
|
||||
fi
|
||||
|
||||
if [ $((now - last_sent)) -ge "$ALERT_REPEAT_SECONDS" ]; then
|
||||
return 0
|
||||
fi
|
||||
|
||||
return 1
|
||||
}
|
||||
|
||||
remember_alert() {
|
||||
local fingerprint="$1"
|
||||
local now
|
||||
|
||||
now="$(date +%s)"
|
||||
mkdir -p "$(dirname "$ALERT_STATE_PATH")"
|
||||
printf '%s\t%s\n' "$fingerprint" "$now" > "$ALERT_STATE_PATH.tmp"
|
||||
mv "$ALERT_STATE_PATH.tmp" "$ALERT_STATE_PATH"
|
||||
}
|
||||
|
||||
clear_alert_state() {
|
||||
rm -f "$ALERT_STATE_PATH" "$ALERT_STATE_PATH.tmp"
|
||||
}
|
||||
|
||||
send_alert_once() {
|
||||
local severity="$1"
|
||||
local topic="$2"
|
||||
local body="$3"
|
||||
local fingerprint
|
||||
local summary
|
||||
|
||||
fingerprint="$(alert_fingerprint)"
|
||||
summary="$(alert_summary)"
|
||||
|
||||
if [ -n "$summary" ]; then
|
||||
body="$body Checks: $summary"
|
||||
fi
|
||||
|
||||
if should_send_alert "$fingerprint"; then
|
||||
send_ntfy "$severity" "$topic" "$body"
|
||||
remember_alert "$fingerprint"
|
||||
fi
|
||||
}
|
||||
|
||||
write_json() {
|
||||
local timestamp
|
||||
local critical_count
|
||||
@@ -243,13 +355,15 @@ write_json() {
|
||||
cat "$OUTPUT_PATH"
|
||||
|
||||
if [ "$status" = "critical" ]; then
|
||||
send_ntfy "critical" "$CRITICAL_TOPIC" "Posture-check critical: $critical_count critical, $warning_count warning. See $OUTPUT_PATH"
|
||||
send_alert_once "critical" "$CRITICAL_TOPIC" "Posture-check critical: $critical_count critical, $warning_count warning. See $OUTPUT_PATH"
|
||||
return 2
|
||||
fi
|
||||
if [ "$status" = "warning" ]; then
|
||||
send_ntfy "warning" "$WARNING_TOPIC" "Posture-check warning: $warning_count warning. See $OUTPUT_PATH"
|
||||
send_alert_once "warning" "$WARNING_TOPIC" "Posture-check warning: $warning_count warning. See $OUTPUT_PATH"
|
||||
return 1
|
||||
fi
|
||||
|
||||
clear_alert_state
|
||||
}
|
||||
|
||||
main() {
|
||||
@@ -258,15 +372,11 @@ main() {
|
||||
need_cmd awk || true
|
||||
|
||||
check_fstype "/mnt/cache" "xfs" "critical" "cache_fstype"
|
||||
if [ "$ALLOW_DISK1_NTFS" = "1" ]; then
|
||||
check_fstype "/mnt/disk1" "ntfs3" "warning" "disk1_fstype"
|
||||
else
|
||||
check_fstype "/mnt/disk1" "xfs" "critical" "disk1_fstype"
|
||||
fi
|
||||
check_disk1_fstype
|
||||
check_no_ntfs_on_core_mounts
|
||||
check_mover_drift
|
||||
check_inode_usage "/mnt/cache" 80 "cache_inode_usage"
|
||||
check_inode_usage "/mnt/disk1" 80 "disk1_inode_usage"
|
||||
check_disk1_inode_usage
|
||||
check_filesystem_usage "/mnt/cache" 70 "cache_fill_level" "warning"
|
||||
|
||||
for share in appdata system domains; do
|
||||
|
||||
@@ -42,6 +42,34 @@ Zeit: taeglich 06:20, Cron `20 6 * * *`.
|
||||
bash /mnt/user/services/homelab-infra/services/posture-check/compose-runtime-drift.sh
|
||||
```
|
||||
|
||||
## `homelab-operations-report-daily`
|
||||
|
||||
Zeit: taeglich nach Borg und den Morgenchecks, z. B. 07:30, Cron `30 7 * * *`.
|
||||
|
||||
Voraussetzung: SMTP-Passwort liegt **nicht im Repo**, sondern auf dem Host:
|
||||
|
||||
```bash
|
||||
mkdir -p /mnt/user/appdata/secrets
|
||||
chmod 700 /mnt/user/appdata/secrets
|
||||
printf '%s' 'SMTP_PASSWORT_HIER_EINTRAGEN' > /mnt/user/appdata/secrets/homelab_smtp_password.txt
|
||||
chmod 600 /mnt/user/appdata/secrets/homelab_smtp_password.txt
|
||||
```
|
||||
|
||||
User Script:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
SEND_MAIL=1 \
|
||||
MAIL_MODE=always \
|
||||
MAIL_FROM="michideheld@gmx.de" \
|
||||
MAIL_TO="Mi.Kaleschke@gmx.de" \
|
||||
SMTP_HOST="smtp.gmx.net" \
|
||||
SMTP_PORT="587" \
|
||||
SMTP_USER="michideheld@gmx.de" \
|
||||
SMTP_PASS_FILE="/mnt/user/appdata/secrets/homelab_smtp_password.txt" \
|
||||
bash /mnt/user/services/homelab-infra/services/posture-check/daily-status-report.sh
|
||||
```
|
||||
|
||||
## `docker-critical-events-at-start`
|
||||
|
||||
Zeit: Array Start. Dieser Job startet einen Hintergrund-Watcher und beendet sich sofort.
|
||||
|
||||
Reference in New Issue
Block a user