Selbst-enthaltener Stafettenstab nach Glance-Ausschluss (130s-Stop-Test):
Polling-Rate unveraendert mit Glance down. Restkandidaten dokumentiert
(Posture-Check, Periphery, Komodo-Self-Check, LAN-Geraet) plus konkrete
Testreihenfolge und Fix-Erwartung.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Schliesst die zwei in ALERT_RULES.md identifizierten Hoch-Luecken:
- up==0 (5m) als critical in neuer Gruppe homelab-meta — Scrape-Targets
(node-exporter/cadvisor/blackbox/traefik) sind nicht laenger stille
Ausfaelle.
- Disk-Critical bei >95% (5m) als critical, zusaetzlich zum bestehenden
Warning bei >85% — fuer DB/appdata/Cache-Schreibblockaden.
ALERT_RULES.md Tabellen und Status-Abschnitt aktualisiert.
Wird wirksam nach Prometheus-Reload via Komodo-Redeploy des monitoring-Stacks.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Nachschlagetabelle aller Prometheus-Alarmregeln (Trigger/Schwelle/Severity/
Aktion) plus Bewertung der Abdeckung. Identifiziert zwei echte blinde Flecke
(kein up==0 Target-Down, kein Disk-Critical-Tier) mit fertigem PromQL als
Empfehlung. Cross-Ref aus ALERTING_MAP.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The original 2026-05-23 baseline was kept as a historical anchor but
the banner was too soft about how much of the concrete content is
already addressed. Reading the document standalone could mislead it
as a current TODO list.
Two changes, original text untouched:
1. Banner now explicitly says the document is mostly outdated,
not to be read as a TODO list, and that the per-finding status
lives in an appendix.
2. New "Status-Anhang 2026-05-30" at the end maps every concrete,
actionable finding to its current state (erledigt / geparkt /
entschieden nicht / offen / teilweise), grouped by the original
sections (Block 1-8) and by the Top-5 lists and Phase-1-to-4
roadmap.
Summary of what the appendix shows:
- Top 5 sofort: 5/5 erledigt
- Quick Wins: 6/7 erledigt, 1 geparkt
- Phase 1: 4/6 erledigt, 1 geparkt, 1 wartend
- Phase 2: 2/5 erledigt, 2 geparkt, 1 offen
- Phase 3: 1 entschieden-nicht, 1 teilweise, 3 offen
- Auth-Block (F-04/13/14/18): fully parked
Original "Schulnote 2-" no longer reflects reality; new note would
land at 1- to 2 but is not the point.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The three notes from 2026-05-23 had been sitting untracked in docs/
for a week. Variante A from today's review: keep them in docs/ with
explicit status banners and reference them from REPO_MAP.md, so they
stop being silent roommates and become discoverable.
- docs/STRATEGISCHE_BEWERTUNG_2026-05-23.md: historical baseline that
kicked off the 2026-05-25 audit cycle. Permanent audit anchor and
"where we stood on 2026-05-23" snapshot. Do not edit further.
- docs/CODEX_KONSOLIDIERUNG_2026-05-23.md: first Codex prompt for the
audit cycle, content worked through; kept as a Codex-prompt
template for future consolidation sweeps.
- docs/CODEX_JELLYFIN_REMOVAL_2026-05-23.md: Codex removal pattern,
task executed 2026-05-25; kept as a template for future stack
removals (Hermes review 2026-07-25, possibly BentoPDF / paperless-gpt
follow-ups).
REPO_MAP.md "Wichtige Dokumente" now lists all three with one-line
purpose plus the F-19 prep doc committed earlier today.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ops/policy-checks/mem-limits-baseline.md captures the deliberate
"not today" decision for memory limits plus the plan for when it
becomes relevant:
- Phase 1: 7 days of hourly docker stats snapshots
- Phase 2: derive Tier-1 peak per container
- Phase 3: set limits at peak * 1.5 with documented floors
(Postgres 1G, Mongo 1G, Redis 256M, etc.)
- Phase 4: roll out smallest-risk containers first, observe 24h
between stages
- Phase 5: Tier-2 only after a concrete trigger event
Next trigger: family invitation out + 4 weeks stable use, or
first real OOM event in docker-critical-events.sh, or a sudden
Immich/Nextcloud load spike where host swap becomes visible.
Today's policy check is clean (0 Critical, 1 documented Warning
on influxdb3-core user 0, 13 documented Info findings on host
ports / privileged exceptions / latest+digest tags).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Six files had outdated status notes that the F-09 first run on
2026-05-30 made wrong:
- ops/restore-tests/komodo-bootstrap-runbook.md: "Erster echter Lauf
steht noch aus" -> first run confirmed
- ops/restore-tests/komodo-bootstrap-plan.md: "Noch offen vor dem
ersten echten Lauf" section -> "Bestaetigte Laeufe" table with
the --what-if and --keep-data runs
- ops/restore-tests/immich-runbook.md: status note still said
"Erster echter Lauf steht noch aus" although the Immich first run
was 2026-05-27; correcting in the same sweep
- docs/AUDIT_2026-05-25_TODO.md: Sprint 2 entry on Komodo bootstrap
path no longer carries the "Trockenlauf-Skript bleibt als offene
Folgeaufgabe" tail
- docs/SERVICES_RECOVERY.md: replaced the "Trockenlauf-Idee (Doku-only,
nicht ausgefuehrt)" section with the confirmed repo-script flow and
marked the two "Naechste Aufgaben" rows about the dry-run as done
- docs/RESTORE_DRILL_ROUTINE.md: Q2 2026 DR-Sanity-Check entry now
splits Komodo-Bootstrap-Pfad (done) from the two still-open items
(Gitea bundles, secrets inventory)
No behavior change, only documentation consistency.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Result on host: SUCCESS, all 5 smoke checks green.
- docker compose config valid
- Test-Mongo healthy in ~6s
- Mongo authenticated ping ok (Test-Creds)
- Komodo Core HTTP 200 on 127.0.0.1:19120
- Test-Periphery container state running
Production komodo-{mongo,core,periphery} and /mnt/user/appdata/komodo/
were not touched; test ran in isolated project restoretest-komodo with
disposable datadir under /mnt/user/backups/restore-lab/komodo/.
Report at /mnt/user/backups/restore-reports/komodo-bootstrap-2026-05-30.md.
Operator-click pattern preserved: SSH to root@kallilabcore is an action
class that requires explicit instruction per CLAUDE.md; the auto-mode
classifier correctly blocked a non-destructive SSH probe. Operator ran
the command via the Unraid web terminal.
ops/komodo/docker-compose.yml is now demonstrably viable as the recovery
anchor for the bootstrap stages in docs/SERVICES_RECOVERY.md, not just
assumed viable. Image digests (mongo:7.0.32, komodo-core:2,
komodo-periphery:2) and Mongo auth schema verified.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New services/authelia-diff.sh compares the access_control: section of the
repo baseline against the live host configuration.yml. OIDC clients,
identity providers, and secret values stay out of scope by design.
Exit codes: 0 ok, 1 drift, 2 file missing, 3 section missing, 4 tool missing.
posture-check.sh gains check_authelia_config_drift, which calls the diff
script and reports drift as warning (not critical). SKIP_AUTHELIA_DRIFT=1
opts out; AUTHELIA_DIFF_SCRIPT overrides the path.
WORKFLOW.md gets a dedicated "Ausnahme: Authelia configuration.yml" section
analogous to the Traefik dynamic-config exception, with the mandatory
repo->host merge workflow and the env-variable contract.
Smoke-tested locally: identical files rc=0, ACL change rc=1 with proper
unified diff, non-ACL change (session.default_redirection_url) correctly
ignored.
Operator follow-up: set up a read-only repo mirror at
/mnt/user/services/homelab-infra/ so the check finds a current baseline.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Setup-Pfad final geworden, vier Reparaturen unterwegs:
1. EAI_AGAIN: Container kann git.kaleschke.info nicht aufloesen ->
--add-host (analog zur Komodo-extra_hosts)
2. Token-Sichtbarkeit in ps/inspect -> --env-file mit 0600 tempfile
3. EACCES auf State-Mount: Renovate-Image laeuft als uid 12021 ->
chmod 0777 auf /mnt/user/services/renovate/state
4. "Repository does not permit pull or push": Renovate-Source-
Code (lib/modules/platform/gitea/index.ts) prueft hardcoded
repo.permissions.push aus der Gitea-API. Mein initialer
SQL-INSERT in die collaboration-Tabelle hatte den Gitea-
In-Memory-Permission-Cache nicht aktualisiert; Operator-
UI-Klick "Entfernen + neu hinzufuegen" loeste den Cache-
Refresh.
Konfigurations-Trennung:
- renovate.json (Repo): nur Repo-Settings (extends, packageRules,
ignorePaths, manager file patterns, labels)
- ops/renovate/bot-config.js: Bot-Settings (platform, endpoint,
autodiscover=false, repositories=[Micha/homelab-infra],
Concurrent-Limits)
Bot-Felder in renovate.json fuehren zu "Repository is forbidden,
status: disabled" weil Renovate die Repo-Config nicht als Bot-
Config wertet.
Erstlauf am 2026-05-29: 5 PRs, 1 Dependency-Dashboard, 8 Branches.
Komodo-Major bleibt durch packageRule deaktiviert wie erwartet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Gitea's /api/v1/user/repos (which Renovate calls during autodiscover)
returns repos where the user is owner or org member, but NOT
collaborator-only repos. Our renovate service account has write
collaborator access on Micha/homelab-infra but no own/org repos,
so autodiscover yielded an empty list.
Switching to explicit "repositories": ["Micha/homelab-infra"] is
the pragmatic fix for a homelab with one repo to scan; avoids
having to create an org just for one service account.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Renovate 41 migriert schedule:weekly auf einen 5am-Monday-only
Lauf - das verhindert beim manuellen Erstlauf jede PR. Wir wollen
dass Renovate bei jedem User-Script-Tick (alle 6h) tatsaechlich
scannt; die Quartals-/Wochen-Rhythmik regeln wir ueber den Cron.
Auch docker-compose.fileMatch ist in Renovate 41 deprecated;
Renovate migriert es zur Laufzeit auf managerFilePatterns mit
regex-Slash-Wrapping. Wir uebernehmen die migrierte Form direkt,
damit die WARN "Config needs migrating" verschwindet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drei Issues beim Erstlauf gefunden und gefixt:
1. EAI_AGAIN: Renovate-Container konnte git.kaleschke.info nicht
aufloesen. Analog zu Komodos extra_hosts mappen wir den Hostname
per --add-host auf 192.168.178.58 (LAN-IP des Unraid-Hosts).
Zusaetzlich --dns 1.1.1.1/8.8.8.8 fuer externe Image-Registries.
2. Token-Leak in ps und docker inspect: -e RENOVATE_TOKEN=... macht
den Wert in Process-Listing sichtbar. Stattdessen --env-file mit
einem 0600 tempfile unter $RENOVATE_STATE_DIR/.env, das nach dem
Lauf via shred bzw. rm geloescht wird.
3. Doppelter rc=$? Block plus return innerhalb einer {}-Subshell
waren Tot-Code; aufgeraeumt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>