homelab-infra

Author	SHA1	Message	Date
renovate	4613da82e2	chore(deps): update redis:8.8.0-alpine docker digest to 9d31717	2026-06-23 18:45:58 +00:00
Micha	7ded74aeef	Rebase stale Renovate branches	2026-06-23 20:45:12 +02:00
Micha	7d4d5f901a	Add Renovate GitHub token support	2026-06-23 20:42:19 +02:00
Micha	ad8010767d	Healthchecks: Gitea->Komodo webhook active, mark webhook step done Webhook authenticated and triggered a successful DeployStack (komodo-core log 18:39:00). Only remaining step is wiring internal jobs as checks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 20:40:42 +02:00
Micha	02389ed292	Healthchecks: mark live after API deploy; document host secret files Stack deployed to Komodo (id 6a3acf2ca7867a4fbab9bfc1), both containers healthy, Traefik route + LE cert OK, DNS resolves, superuser created and auth-verified. Flip status to live in ARCHITECTURE 7.6, SERVICE_CATALOG, MASTER_TODO and the stack README. Document the new host secret files (secret_key, superuser_password = login password, webhook_secret) in SECRETS_MAP. Remaining operator step: the Gitea->Komodo webhook. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 20:31:10 +02:00
Micha	cbfbb8ca4f	Add self-hosted Healthchecks stack for internal job monitoring (hybrid) Self-hosted Healthchecks (ops/healthchecks/) as the hub for internal cron/job heartbeats. The three host-down/backup watchdogs (Borg pre-hook, baerchen nearline pull, monitoring watchdog #8) deliberately stay on healthchecks.io cloud, since an on-host watcher cannot report a host outage. - frontend_net + dedicated PostgreSQL 18 in healthchecks_internal - native Healthchecks auth; ping/API exempt from Authelia (n8n/Komodo pattern) - registered as middleware_exempt in ops/policy-checks/exceptions.json - docs: DECISIONS, ARCHITECTURE (3.1/4.2/7.6/10), SERVICE_CATALOG, SECRETS_MAP, MASTER_TODO, README index docker compose config validated (exit 0). Not yet deployed: host secret file, appdata dir, Komodo stack + ENV and Gitea webhook remain operator steps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 20:09:56 +02:00
Micha	ee0d450a27	docs: close audit remediation status	2026-06-23 19:50:48 +02:00
Micha	79657d526c	Record alert-chain end-to-end verification in MASTER_TODO Codex drills proved all six alert paths to the phone (send-ntfy, restore wrapper, freshness negative, Alertmanager->bridge->ntfy, docker-critical-watcher smoke, Borg pre-hook failure). Add a Kurzlog entry, note the send-ntfy exec-bit beleg-bug (`6870ae5`), and the one optional gap (Prometheus->AM leg via temp rule). Trim oldest entry to stay at max 5. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 19:12:57 +02:00
Micha	6870ae53da	Mark send-ntfy.sh executable so restore-failure alerts fire run-restore-job-with-ntfy.sh execs send-ntfy.sh directly; without the exec bit the failure-alert path errored with Permission denied (found during Codex alert drill 2026-06-23). Set the exec bit in the repo to match the live fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 17:25:24 +02:00
Micha	46d6010c66	Record backend_net internal:true after live flip; close audit remediation backend_net was recreated with --internal (Codex live): egress from postgresql17 blocked, all 12 members reattached, frontends and DB connections verified. Move the parked #17 item to the MASTER_TODO Kurzlog and confirm the live state in NETWORK_INVENTORY. No dawarich_egress needed (sidekiq makes no external connections). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 15:48:59 +02:00
Micha	5a0a4c9d56	Park backend_net internal:true hardening with egress prereq in MASTER_TODO Capture the audit egress analysis durably so the deferred maintenance window keeps the prep. backend_net -> internal:true is the only remaining P3 item; the single risk is dawarich_sidekiq (the only backend_net-only worker), all DB/cache and dual-homed containers are safe. If sidekiq needs egress, use a dedicated dawarich_egress net (immich_egress precedent). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 15:22:33 +02:00
Micha	c4ba67b55c	Activate Hetzner snapshot restore test after live validation Codex first live run passed (SUCCESS, 7 snapshots, single-file restore from .zfs/snapshot; report hetzner-snapshot-2026-06-23.md) with no ENV overrides. Set runbook status to active, document the run, and add the monthly cadence (15th, cron 0 6 15 * *) to schedule.md and the restore-tests README. Remaining host step: create the Unraid User Script restore-hetzner-snapshot-monthly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 13:15:58 +02:00
Micha	275558b2db	Record audit-2026-06-23 remediation status in MASTER_TODO Add an Aktiv row for the remaining Codex/operator follow-ups (#19 snapshot-test validation, #12 default-bridge recreate, #14 drift redeploy, #15 immich path) and a Kurzlog entry summarizing the closed P1/P2 core (Vault /admin + Komodo de-publicized, snapshots proven, auth-matrix). Trim oldest Kurzlog entry to stay at max 5. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 12:23:59 +02:00
Micha	3e9c12eb75	Add Hetzner Storage Box snapshot restore test Make the off-site snapshot protection a repeatable, monitored proof (DECISIONS 2026-06-11/-23): a read-only restore-test that lists .zfs/snapshot on the Storage Box, checks retention and newest-snapshot age, and SFTP-fetches one small file from the newest snapshot (size + SHA256). Connection is derived from the borg-ui repo URL and runs via docker exec borg-ui; no secret in the script, no write access. Wired into the run-restore-checks.sh dispatcher; runbook documents the pending one-time live validation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 12:18:44 +02:00
Micha	813d3bd303	Mirror Komodo IP-allowlist labels and document de-publicization Codex applied the ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24) to the Komodo router live in the inline-managed self-stack; public now returns 403. Mirror the labels in ops/komodo/docker-compose.yml for parity (not auto-deployed), record the decision in docs/DECISIONS.md, and update docs/AUTH_MATRIX.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 12:11:45 +02:00
Micha	ad47979000	Add consolidated Auth-Matrix doc Consolidate effective access policy per public domain (Authelia bypass/two_factor, native exceptions, Tailscale-only, IP-allowlist) into a single reviewable matrix, surfacing the Authelia bypass list that previously lived only in the live config. Indexed in docs/README.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 11:03:26 +02:00
Micha	23a6975a67	Restrict Vaultwarden /admin to trusted networks (Tailscale + LAN) Audit 2026-06-23 (P1): /admin was publicly reachable (200). Add a higher-priority Traefik router scoped to PathPrefix(/admin) with an ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24); the main router stays native for browser and mobile clients. Documented in docs/DECISIONS.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 11:03:26 +02:00
Micha	81151d8af4	Fix Dawarich Grafana datasource database config	2026-06-22 20:45:09 +02:00
Micha	45ff8286cf	Use table-format Dawarich Grafana panels	2026-06-22 20:20:07 +02:00
Micha	f318d80477	Simplify Dawarich Grafana dashboard query model	2026-06-22 20:12:14 +02:00
Micha	b8d9bba5d3	Replace Dawarich Grafana dashboard	2026-06-22 19:55:47 +02:00
Micha	3bebc03a8f	chore(deps): move dawarich redis to v8 track dawarich_redis was the last redis instance still on 7-alpine; the closed PR #10 kept it as an "Ignored or Blocked" entry in the Renovate Dependency Dashboard (issue #6). Bump to the already-running redis:8.8.0-alpine digest and add apps/dawarich to the renovate redis 8.x allowedVersions pin. Data path /mnt/user/appdata/dawarich/redis unchanged; redis 8 loads the existing RDB snapshots. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 19:51:36 +02:00
Micha	0f1e78e0ca	Fix Dawarich Grafana readonly user init	2026-06-22 19:01:39 +02:00
Micha	658750bc19	Allow Dawarich mobile track point route	2026-06-22 16:55:18 +02:00
Micha	5afba298e9	Allow Dawarich mobile API routes	2026-06-22 16:45:26 +02:00
Micha	bd0deea90d	Use internal Dawarich metrics scrape	2026-06-21 23:21:21 +02:00
Micha	5a6ab2cc37	Bypass Authelia for Dawarich healthcheck	2026-06-21 23:17:38 +02:00
Micha	30c3435ddf	Bypass Authelia for Dawarich metrics	2026-06-21 23:04:58 +02:00
Micha	b236eaeeaa	Restore Dawarich metrics basic auth config	2026-06-21 23:02:26 +02:00
Micha	4cf9e3226e	Use credentials file for Dawarich metrics scrape	2026-06-21 23:00:00 +02:00
Micha	699b1f118e	Scrape Dawarich metrics over HTTPS	2026-06-21 22:52:57 +02:00
Micha	db886c9eb2	Send forwarded proto for Dawarich metrics	2026-06-21 22:47:51 +02:00
Micha	2a342614db	Allow internal Dawarich scrape host	2026-06-21 22:44:48 +02:00
Micha	2bb6eaa267	Run Prometheus as root for file secrets	2026-06-21 22:43:28 +02:00
Micha	cb80e2d2c0	Fix Dawarich Prometheus secret permissions	2026-06-21 22:41:46 +02:00
Micha	201b201657	Fix Dawarich healthcheck behind HTTPS	2026-06-21 22:39:15 +02:00
Micha	725e3b0125	Add Dawarich stack	2026-06-21 22:32:41 +02:00
Micha	1de6ffc5ac	chore(deps): update nextcloud to v34	2026-06-21 21:29:37 +02:00
Micha	5559aa3f24	chore(deps): update gitea and cadvisor images	2026-06-21 21:20:11 +02:00
Micha	ed61fda0ec	docs(renovate): Routine-Merge-Runde 2026-06-21 im Betriebsstand festhalten Sechs Renovate-PRs gemergt (minor-patch-Gruppe, unbound/traefik/postgres Digest-Refreshes, n8n 2.27, nextcloud-33 Digest); Nextcloud-34-Major bewusst gehalten. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 19:31:09 +02:00
Micha	85d8270898	Merge Renovate: renovate/docker.n8n.io-n8nio-n8n-2.x	2026-06-21 19:28:53 +02:00
Micha	a904955d11	Merge Renovate: renovate/nextcloud-33.0.5-apache	2026-06-21 19:28:53 +02:00
Micha	882ea5ad01	Merge Renovate: renovate/postgres-18.4	2026-06-21 19:28:53 +02:00
Micha	e3ae97bbaf	Merge Renovate: renovate/traefik-v3.7	2026-06-21 19:28:53 +02:00
Micha	64976e0c0e	Merge Renovate: renovate/shaanmajid-unbound-1.25.1	2026-06-21 19:28:53 +02:00
Micha	424772dcfa	Merge Renovate: renovate/minor-patch-updates	2026-06-21 19:28:53 +02:00
Micha	c8380b5755	docs: H:-Nearline als Restore-Quelle im DR-Fall dokumentieren Bisher war die H:-Nearline-Kopie nur als Backup-Ziel beschrieben (CAPACITY_AND_LIFECYCLE), nicht als Restore-Quelle. Im Ernstfall fehlte der Hinweis, dass auf baerchen eine frische lokale Kopie aller Dumps + Bundles liegt. - ops/h-drive-nearline/README.md: neuer Abschnitt "Restore aus H:/ (DR-Fall)" mit Inhalt, Rueckspiel-Weg (-> RESTORE_MATRIX / SERVICES_RECOVERY) und Pflicht-Frische-Pruefung (deckt den S4U-Stale-Fall ab). - docs/DISASTER_RECOVERY.md: baerchen-Abschnitt verweist jetzt auf die H:-Fallback-Restore-Quelle und das Runbook. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 19:16:09 +02:00
Micha	83d7988c72	nearline: S4U-Fix angewendet und verifiziert (LogonType Interactive) Task "KalliLab H Drive Nearline Pull" am 2026-06-21 von S4U auf LogonType Interactive ("Nur ausfuehren, wenn der Benutzer angemeldet ist") umgestellt. Kein gespeichertes Passwort noetig, da michi der dauerhaft angemeldete Konsolen-User ist. Per Scheduler ausgeloest, Ergebnis 0x0 verifiziert (SMB-Zugriff vorhanden, Spiegel frisch). Doku korrigiert: README beschreibt jetzt Interactive als angewendete Loesung (Password war nur die nicht genutzte Alternative). MASTER_TODO: Root-Cause behoben, nur noch optionale Healthchecks-URL offen. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 19:06:55 +02:00
Micha	1a4593110a	nearline: S4U-Root-Cause dokumentiert + Exitcode-Leak gefixt Diagnose 2026-06-21: Der Scheduled Task "KalliLab H Drive Nearline Pull" lief als LogonType S4U (ohne gespeichertes Passwort) und hatte damit keine Netzwerk-Anmeldeinformationen fuer den SMB-Share \192.168.178.58\backups. Jeder geplante 05:30-Lauf brach still mit Exit 1 ab, ohne Report; der Nearline-Spiegel war 2026-06-19 bis 2026-06-21 veraltet. Manuell nachgezogen, Spiegel wieder frisch. pull-critical-backups.ps1: explizites `exit 0` auf dem Erfolgspfad, damit der letzte robocopy-Exitcode (1 = "Dateien kopiert") nicht als Prozess-Exit leakt und der Scheduled Task ein wahrheitsgemaesses Ergebnis meldet. README: Pflicht-Hinweis, dass der Task mit gespeichertem Passwort (nicht S4U) laufen muss. MASTER_TODO: Root-Cause + verbleibender Operator-Schritt. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 18:02:58 +02:00
Micha	f296338530	monitoring + backup: Stale-Handle-Hardening und Dead-Man's-Switch Schliesst den lokalen Code-Stand fuer zwei offene MASTER_TODO-Punkte ab. monitoring: restliche Einzeldatei-Bind-Mounts (alertmanager, blackbox, loki, promtail, alertmanager-ntfy-bridge) auf Directory-Mounts umgestellt, analog zum Prometheus-Fix vom 2026-06-19. Vermeidet "Stale NFS file handle" auf dem /mnt/user-FUSE-Share bei git/Komodo-Updates. grafana-provisioning war bereits Directory-Mount. `docker compose config` gruen. Beim Deploy --force-recreate noetig, da sich Mount-Zielpfade aendern. backup: endpoint-agnostischer Dead-Man's-Switch (Healthchecks-kompatibel, Cloud oder self-hosted) in pull-critical-backups.ps1 und pre-borg.sh. Pings /start, Erfolg und /fail; No-Op ohne konfigurierte URL, bricht also keinen Lauf. Ping-URLs sind Capability-URLs und bleiben als Secret ausserhalb des Repos. Doku: SECRETS_MAP, Nearline-README und MASTER_TODO nachgezogen. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 17:54:53 +02:00

1 2 3 4 5 ...

853 Commits