New services/authelia-diff.sh compares the access_control: section of the
repo baseline against the live host configuration.yml. OIDC clients,
identity providers, and secret values stay out of scope by design.
Exit codes: 0 ok, 1 drift, 2 file missing, 3 section missing, 4 tool missing.
posture-check.sh gains check_authelia_config_drift, which calls the diff
script and reports drift as warning (not critical). SKIP_AUTHELIA_DRIFT=1
opts out; AUTHELIA_DIFF_SCRIPT overrides the path.
WORKFLOW.md gets a dedicated "Ausnahme: Authelia configuration.yml" section
analogous to the Traefik dynamic-config exception, with the mandatory
repo->host merge workflow and the env-variable contract.
Smoke-tested locally: identical files rc=0, ACL change rc=1 with proper
unified diff, non-ACL change (session.default_redirection_url) correctly
ignored.
Operator follow-up: set up a read-only repo mirror at
/mnt/user/services/homelab-infra/ so the check finds a current baseline.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Befund vom 2026-05-29: HomelabBorgLastJobCompletedWithWarnings
zuendete vier Tage in Folge mit Borg-Exit-Code 107. Ursache im
Logfile: /local/appdata/homepage wurde am 25.05. entfernt, aber
in der Borg-UI-Source-Liste blieb der Eintrag drin und Borg
warnte taeglich BackupFileNotFoundError. Backups selbst waren
nicht gefaehrdet (alle 23 anderen Quellen sauber archiviert).
Operator hat den Eintrag in der Borg-UI manuell entfernt;
Source-Liste jetzt 23 statt 24, naechster Lauf 2026-05-30 sollte
wieder completed ohne Warning sein.
Erkenntnis: bei Stack-Removal wurde die Borg-Source-Liste nicht
mit-aufgeraeumt. WORKFLOW.md um neuen Abschnitt "Service-Removal-
Checkliste" erweitert mit 9 Pflichtschritten inklusive
Borg-UI-Source-Bereinigung als Schritt 8.
Positiv: die am 2026-05-27 scharfgeschaltete Alert-Pipeline
(Cron Textfile -> node-exporter -> Prometheus -> Alertmanager
-> ntfy-Bridge) hat den Drift binnen 24 h sichtbar gemacht.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Operational hardening across several services after live incident
analysis between 2026-05-18 and 2026-05-20:
- Gitea: disable public registration and OpenID signup/signin to
stop the external POST / 5xx bursts that triggered availability
alerts. New repo-wide policy requires every productive
Micha/homelab-infra Komodo stack to ship with an active
Gitea->Komodo webhook on the current stack ID (documented in
CLAUDE.md, AI_CONTEXT.md, WORKFLOW.md).
- posture-check: extract the Disk1 fstype check into its own
function so the documented Disk1 NTFS exception no longer raises
ntfy warnings, skip POSIX inode checks on NTFS, and dedup ntfy
alerts via a fingerprint state file with ALERT_REPEAT_SECONDS
(default 24h). Repeat-spam on the same cause now suppressed.
- docker-critical-events: parse the event JSON for container name,
action, exit code and signal; drop `die exit=0` events (clean
stops); ship a structured ntfy message instead of the raw event
line.
- Borg UI: mount /mnt/user/services into the backup container as
/local/services:ro and include homelab-infra, stacks and
posture-check in all-important-sources.txt. RESTORE_MATRIX and
DISASTER_RECOVERY updated accordingly.
- Unraid user scripts: document the new
homelab-operations-report-daily cron job and the SMTP password
file it expects on the host.
- MIGRATION_LOG: capture the four live events from this window -
Gitea 5xx burst + signup closure, Komodo webhook reconciliation,
posture-check host-version verification, Borg scope extension,
and Traefik 5xx alert detuning.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>