Commit Graph

195 Commits

Author SHA1 Message Date
Micha ad9267c66a Split renovate config: repo config in renovate.json, bot config in ops/
Renovate liest die repo-eigene renovate.json als REPO-Config, nicht
als BOT-Config. Bot-spezifische Felder (platform, endpoint,
repositories, autodiscover, gitAuthor, prHourlyLimit, ...) gehoeren
nicht hinein und werden als "this repo is forbidden / disabled"
fehlinterpretiert.

Saubere Trennung:
- renovate.json (Repo-Root): nur extends, packageRules,
  ignorePaths, manager file patterns, labels, rangeStrategy
- ops/renovate/bot-config.js: Plattform, Endpoint, Username,
  gitAuthor, autodiscover=false, repositories=[Micha/homelab-infra],
  Concurrent-/Hourly-Limits

bot-config.js statt config.json, weil Renovate Module-exports als
config-file akzeptiert (offizielle Variante).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 20:20:00 +02:00
Micha bdae014bff Harden renovate runner: env-file, add-host, explicit DNS
Drei Issues beim Erstlauf gefunden und gefixt:

1. EAI_AGAIN: Renovate-Container konnte git.kaleschke.info nicht
   aufloesen. Analog zu Komodos extra_hosts mappen wir den Hostname
   per --add-host auf 192.168.178.58 (LAN-IP des Unraid-Hosts).
   Zusaetzlich --dns 1.1.1.1/8.8.8.8 fuer externe Image-Registries.

2. Token-Leak in ps und docker inspect: -e RENOVATE_TOKEN=... macht
   den Wert in Process-Listing sichtbar. Stattdessen --env-file mit
   einem 0600 tempfile unter $RENOVATE_STATE_DIR/.env, das nach dem
   Lauf via shred bzw. rm geloescht wird.

3. Doppelter rc=$? Block plus return innerhalb einer {}-Subshell
   waren Tot-Code; aufgeraeumt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 20:04:24 +02:00
Micha 30aa696e61 Prepare Renovate bot against Gitea (F-12) + doc sweep
renovate.json: gitea platform, autodiscover Micha/*, group rules
(major separate, minor+patch+digest grouped, stateful tier-1
individual, komodo-major disabled), pin range strategy, no
automerge, dependency dashboard enabled.

ops/renovate/run-renovate.sh: one-shot docker run wrapper that
reads the Gitea PAT from /mnt/user/appdata/secrets/renovate_token.txt,
runs renovate/renovate:41, logs into /mnt/user/services/renovate/logs/.

docs/RENOVATE.md: 5-step operator setup (Gitea service account,
PAT, token file, first run, six-hourly user script). Explicit
no-automerge stance with notfall-stop checklist.

Cross-doc sweep: SECRETS_MAP entry for renovate_token.txt,
REPO_MAP entry for RENOVATE.md, AUDIT_2026-05-25_TODO new
Sprint 8 with F-15, F-07, F-09 rest, F-12 status, MIGRATION_LOG
captures the four-block sprint in one entry.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 15:29:20 +02:00
Micha e4b0db2af6 Add Komodo bootstrap dry-run scaffold (F-09 rest)
Mirror of the Immich restore-test pattern for the Komodo bootstrap
anchor. Brings up a throwaway komodo-mongo + komodo-core +
komodo-periphery under project restoretest-komodo, isolated from
production:

- same image digests as production (mongo:7.0.32, komodo-core:2,
  komodo-periphery:2) to prove compose-level bootstrap compatibility
- restore-lab paths under /mnt/user/backups/restore-lab/komodo
- 127.0.0.1:19120 only, no LAN bind, no Traefik, no Authelia
- test periphery runs WITHOUT docker.sock mount and WITHOUT
  /mnt/user/services mount; cannot manage productive containers
- KOMODO_* secrets are throwaway placeholders hardcoded in the test
  compose; productive secrets never enter this path

Smoke test: compose config valid, mongo healthy, mongo auth-ping
with test creds, komodo-core HTTP 200/302/303/401, periphery
container running. Report under restore-reports/komodo-bootstrap-*.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 15:25:41 +02:00
Micha 3c71a66c55 Document monitoring alerts, bundle cron and H/ pull live status
- AUDIT_2026-05-25_TODO: Borg-Stale, Cert-Expiry, Container-Down
  Alerts auf "erledigt" (Cron */5 textfile exporter live,
  Prometheus reload mit 14 Regeln); Gitea-Bundle-Cron auf "erledigt"
  (User-Script gitea-bundle-mirror-6h aktiv, Bundles 644);
  H:/ Nearline-Pull auf "erledigt (Pull live, Scheduled Task offen)"
  mit Zaehlerstaenden 19 Borg-Dumps + 10 Bundle-Files.

- MIGRATION_LOG: neuer Eintrag fasst die drei zusammenhaengenden
  Live-Aktivierungen zusammen, inkl. Befund-Ursprung (Permission-
  Drift), Reparaturen und expliziter Ausklammerung der nicht
  angefassten Themen (Auth, Hermes, USV, FRITZ!Box, Plex).

- H_DRIVE_NEARLINE_PULL: Erstlauf-Befund mit Permission-Issues
  und nachgezogenem Stand; Erwartungs-Liste auf real geliefertes
  Set angepasst; Flash-Config explizit Out-of-Scope.

- pull-critical-backups.ps1: Live-Robocopy-Output an Out-Null,
  damit der Markdown-Report nicht von Robocopy-Strings zerlegt
  wird (PowerShell-Pipeline-Quirk im foreach).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:48:04 +02:00
Micha 24d0d90670 Make dump output 0644 by default, exclude flash config from H pull
pre-backup-dumps.sh: atomic_write nimmt jetzt einen optionalen
mode-Parameter (Default 0644). Damit sind alle DB-/SQLite-/BoltDB-
/Mongo-Dumps konsistent 0644 und vom Nearline-Pull lesbar. Die
sensible unraid-flash-config-Familie (.tar.gz, .sha256, .manifest)
ruft explizit mit mode 600 auf und bleibt damit Operator-only.
Loest das Permission-Problem fuer filebrowser.bolt.dump (Source
ist 0640) im naechsten regulaeren Dump-Lauf.

pull-critical-backups.ps1: Jobs koennen ExcludeFiles ueber /XF
mitliefern. borg-dumps-latest schliesst die unraid-flash-config-
Artefakte aus, weil sie bewusst 0600 bleiben sollen und sonst den
Lauf abbrechen lassen. Restore-Quelle fuer Flash-Config bleibt
das Hetzner-Borg-Repo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:44:50 +02:00
Micha 0ae44bd797 Write Prometheus textfile and Gitea bundles world-readable
node-exporter runs as nobody:65534 inside its container and was
hitting node_textfile_scrape_error 1 on homelab.prom, because the
file was 0600 root:root (mktemp default). Set it to 0644 right
before the atomic mv. Bundle inhaltsidentisch zum Git-Repo, ohne
Secrets (.gitignore-abgedeckt) und nicht sensibler als die
uebrigen /mnt/user/backups/borg/dumps/latest/*.dump-Files, die
ebenfalls 0644 sind. So funktioniert auch der Nearline-Pull-Workflow
ueber SMB (docs/H_DRIVE_NEARLINE_PULL.md).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:41:07 +02:00
Micha c4fd4154db Document quarterly restore drill routine
New docs/RESTORE_DRILL_ROUTINE.md introduces a three-stage model:
weekly freshness check, monthly/bimonthly mini-restores, quarterly
DR sanity check. Tracks confirmed mini-restores (Vaultwarden, Gitea,
Paperless 2026-05-07; Immich 2026-05-27) and rotates services by
quarter Q1-Q4. Includes ten-point DR sanity check and abort rules
that point at the drift runbook. No host schedule is created; the
existing ops/restore-tests/schedule.md now references this routine
as the source for quarterly assignment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:15:43 +02:00
Micha 52414c47be Record Immich restore test success 2026-05-27 18:38:14 +02:00
Micha a8c440d4da Read Immich v2 restore counts 2026-05-27 18:33:29 +02:00
Micha 12cf8fb728 Prepare Immich restore upload markers 2026-05-27 18:29:53 +02:00
Micha 5b0782a8fa Harden Immich restore smoke checks 2026-05-27 18:25:30 +02:00
Micha a805f03481 Retry Immich restore during Postgres startup 2026-05-27 18:18:55 +02:00
Micha 4feecf4a8e Make Immich restore database creation idempotent 2026-05-27 18:16:25 +02:00
Micha 2e84700326 Make Immich restore test create database 2026-05-27 18:14:40 +02:00
Micha 8a19c45485 Use Borg known_hosts in restore tests 2026-05-27 18:12:48 +02:00
Micha 38c3d87722 Prepare H drive nearline pull 2026-05-27 06:25:47 +02:00
Micha c5d231a0db Prepare Immich restore smoke test 2026-05-26 21:33:01 +02:00
Micha 5c5ca2fcec Fix Gitea bundle mirror host run 2026-05-26 20:16:19 +02:00
Micha 5936a4d9c1 Add Gitea bundle recovery script 2026-05-26 19:50:50 +02:00
Micha eea2697ca1 Triage policy check warnings 2026-05-26 19:42:01 +02:00
Micha 45bae13aa0 Remove legacy monitoring stacks 2026-05-26 15:27:37 +02:00
Micha 5cb401797d Bind AdGuard admin to Tailscale 2026-05-26 14:55:49 +02:00
Micha 9353a9fc44 Fix Borg preflight freshness dump path 2026-05-25 19:44:22 +02:00
Micha d50b11784d Add Unraid flash config to Borg preflight 2026-05-25 19:36:16 +02:00
Micha b6bbca43ad Replace Uptime Kuma with monitoring checks 2026-05-25 16:37:46 +02:00
Micha a7797fd02e Consolidate dashboard on Glance 2026-05-25 14:44:46 +02:00
Micha add8b71ea9 Remove Jellyfin from homelab target state 2026-05-25 11:57:00 +02:00
Micha 8e400fb3c3 Finalize homelab audit end state 2026-05-23 11:29:08 +02:00
Micha cd650b19ac Close Gitea signup, dedup posture-check alerts, extend Borg scope
Operational hardening across several services after live incident
analysis between 2026-05-18 and 2026-05-20:

- Gitea: disable public registration and OpenID signup/signin to
  stop the external POST / 5xx bursts that triggered availability
  alerts. New repo-wide policy requires every productive
  Micha/homelab-infra Komodo stack to ship with an active
  Gitea->Komodo webhook on the current stack ID (documented in
  CLAUDE.md, AI_CONTEXT.md, WORKFLOW.md).
- posture-check: extract the Disk1 fstype check into its own
  function so the documented Disk1 NTFS exception no longer raises
  ntfy warnings, skip POSIX inode checks on NTFS, and dedup ntfy
  alerts via a fingerprint state file with ALERT_REPEAT_SECONDS
  (default 24h). Repeat-spam on the same cause now suppressed.
- docker-critical-events: parse the event JSON for container name,
  action, exit code and signal; drop `die exit=0` events (clean
  stops); ship a structured ntfy message instead of the raw event
  line.
- Borg UI: mount /mnt/user/services into the backup container as
  /local/services:ro and include homelab-infra, stacks and
  posture-check in all-important-sources.txt. RESTORE_MATRIX and
  DISASTER_RECOVERY updated accordingly.
- Unraid user scripts: document the new
  homelab-operations-report-daily cron job and the SMTP password
  file it expects on the host.
- MIGRATION_LOG: capture the four live events from this window -
  Gitea 5xx burst + signup closure, Komodo webhook reconciliation,
  posture-check host-version verification, Borg scope extension,
  and Traefik 5xx alert detuning.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 11:05:35 +02:00
Micha 71ac18b21c Fix Jellyfin native auth routing 2026-05-18 13:43:41 +02:00
Micha 59bec9ac77 Fix Glance live widget data sources 2026-05-18 09:35:53 +02:00
Micha 9f86da708a Add Glance live network widgets 2026-05-18 08:31:57 +02:00
Micha d6170211c4 Refine Glance network widgets 2026-05-18 08:13:13 +02:00
Micha fb681086f3 Restyle Glance dashboard layout 2026-05-18 08:03:59 +02:00
Micha 5b101f3b3d Keep only verified Glance community widget 2026-05-17 18:20:56 +02:00
Micha 669efbd57e Fix Glance Speedtest subrequest headers 2026-05-17 18:18:14 +02:00
Micha 2dd5590a2a Polish Glance community widgets 2026-05-17 18:16:05 +02:00
Micha 175cd6951f Add Glance community homelab widgets 2026-05-17 18:07:57 +02:00
Micha aeb7573b03 Remove noisy Glance dashboard widgets 2026-05-17 17:08:10 +02:00
Micha 215f44b962 Fix Glance monitor health checks 2026-05-17 17:05:23 +02:00
Micha 6ce625f77a Fix Glance socket proxy image tag 2026-05-17 16:59:29 +02:00
Micha c3c8060ddf Add Glance homelab dashboard stack 2026-05-17 16:51:43 +02:00
Micha 29eaf8001f Normalize ntfy alert routing 2026-05-17 14:57:45 +02:00
Micha 53216e50c1 Fix monitoring InfluxDB volume permissions 2026-05-17 10:45:32 +02:00
Micha b7dfdad621 Consolidate monitoring target stack 2026-05-17 10:41:29 +02:00
Micha 6ca829ec45 Document Unraid automation schedules 2026-05-16 20:11:19 +02:00
Micha 0adddb6533 Add Unraid automation script templates 2026-05-16 14:34:35 +02:00
Micha bf30240217 Remove Loki image-internal healthcheck 2026-05-16 13:32:02 +02:00
Micha 5f7940aa01 Tune Loki host bootstrap settings 2026-05-16 13:31:08 +02:00