Remove legacy monitoring stacks
This commit is contained in:
@@ -1,90 +0,0 @@
|
||||
# Grafana + InfluxDB 3 Core
|
||||
|
||||
Status: abgeloester Altstand. Der zentrale Zielzustand ist `monitoring/` mit `monitoring-grafana`, `monitoring-influxdb3-core`, Prometheus, Loki und Promtail.
|
||||
|
||||
Monitoring-Stack fuer Grafana + InfluxDB 3 Core. InfluxDB bleibt ohne Public Route; interne Writer wie Home Assistant koennen ueber einen gezielt gebundenen LAN-Port schreiben.
|
||||
|
||||
Nach erfolgreichem `monitoring/`-Deploy diesen Stack nicht parallel weiterbetreiben. Er bleibt vorerst als Rollback- und Migrationsreferenz im Repo.
|
||||
|
||||
## Quellen / Entscheidungen
|
||||
|
||||
- Grafana nutzt das offizielle OSS-Image `grafana/grafana:12.4.3`.
|
||||
- InfluxDB nutzt `influxdb:3.9.1-core`, nicht `latest`, weil `latest` bei InfluxDB aktiv in Richtung InfluxDB 3 umgestellt wird.
|
||||
- Grafana wird ueber Traefik + `authelia@file,secure-headers@file` unter `grafana.kaleschke.info` veroeffentlicht.
|
||||
- InfluxDB bleibt ohne Traefik-Route. Der HTTP-Port `8181` kann fuer interne Writer wie Home Assistant ueber `INFLUXDB_BIND_IP` auf eine LAN-Adresse gebunden werden; Default ist `127.0.0.1`.
|
||||
- InfluxDB haengt an zwei Compose-Netzen: `grafana_influx_internal` fuer Grafana und `grafana_influx_lan` fuer das Docker Host-Port-Publishing. Im laufenden Komodo-Stack heissen sie durch den Compose-Projektpraefix `grafana_grafana_influx_internal` und `grafana_grafana_influx_lan`. InfluxDB haengt bewusst nicht im `frontend_net`.
|
||||
- Grafana provisioning legt eine SQL-Datenquelle fuer InfluxDB 3 Core mit der Datenbank `homelab` und eine Loki-Datasource fuer Container-Logs an.
|
||||
- Der Grafana-Datasource-Token liegt als Secret-Datei auf dem Host und wird beim Containerstart nur containerintern in die fuer Grafana-Provisioning noetige Environment-Variable geladen.
|
||||
- Home Assistant schreibt mit der InfluxDB-v2-API-Kompatibilitaet nach InfluxDB 3; Details: `docs/HOME_ASSISTANT_INFLUXDB_ECOWITT.md`.
|
||||
|
||||
## Initiale Einrichtung
|
||||
|
||||
1. Secret fuer Grafana anlegen:
|
||||
|
||||
```bash
|
||||
install -m 600 /dev/null /mnt/user/appdata/secrets/grafana_admin_password.txt
|
||||
```
|
||||
|
||||
2. Offline-Admin-Token fuer InfluxDB 3 als JSON anlegen:
|
||||
|
||||
```json
|
||||
{
|
||||
"token": "apiv3_REPLACE_WITH_STRONG_RANDOM_TOKEN",
|
||||
"name": "admin",
|
||||
"description": "Admin token for KalliLab InfluxDB 3 Core"
|
||||
}
|
||||
```
|
||||
|
||||
Pfad: `/mnt/user/appdata/secrets/influxdb3_admin_token.json`, Rechte `600`.
|
||||
|
||||
3. Grafana-Datasource-Token anlegen. Fuer InfluxDB 3 Core aktuell einen eigenen Named-Admin-Token verwenden, damit der Grafana-Zugang getrennt vom initialen Operator-/Admin-Token rotiert werden kann:
|
||||
|
||||
```bash
|
||||
install -m 600 /dev/null /mnt/user/appdata/secrets/grafana_influxdb_token.txt
|
||||
```
|
||||
|
||||
4. Provisioning-Dateien aus dem Git-Checkout auf den Host-Appdata-Pfad kopieren:
|
||||
|
||||
```bash
|
||||
mkdir -p /mnt/user/appdata/grafana/provisioning/datasources
|
||||
mkdir -p /mnt/user/appdata/grafana/provisioning/dashboards
|
||||
cp /mnt/user/appdata/komodo/core/repos/homelab-infra/ops/grafana-influxdb/provisioning/datasources/influxdb.yml /mnt/user/appdata/grafana/provisioning/datasources/influxdb.yml
|
||||
cp /mnt/user/appdata/komodo/core/repos/homelab-infra/ops/grafana-influxdb/provisioning/dashboards/* /mnt/user/appdata/grafana/provisioning/dashboards/
|
||||
chmod 644 /mnt/user/appdata/grafana/provisioning/datasources/influxdb.yml
|
||||
chmod 644 /mnt/user/appdata/grafana/provisioning/dashboards/*
|
||||
```
|
||||
|
||||
5. Nach dem ersten Start die Datenbank anlegen:
|
||||
|
||||
```bash
|
||||
docker exec influxdb3-core influxdb3 create database homelab --token "$INFLUXDB3_AUTH_TOKEN"
|
||||
```
|
||||
|
||||
## Smoke-Test nach Deploy
|
||||
|
||||
- `https://grafana.kaleschke.info` oeffnet nach Authelia die Grafana-Loginseite.
|
||||
- Grafana `Connections -> Data sources -> InfluxDB 3 Core -> Save & test` ist erfolgreich.
|
||||
- Grafana `Connections -> Data sources -> Loki -> Save & test` ist erfolgreich, sobald der Loki/Alloy-Stack laeuft.
|
||||
- Die provisionierten Dashboards `Logs - Last 60m`, `Container Restart Events` und `Container Error Rate` sind sichtbar.
|
||||
- InfluxDB bleibt ohne Public Route. Falls `INFLUXDB_BIND_IP` auf die LAN-IP gesetzt ist, ist Port `8181` nur im internen Netz fuer Writer wie Home Assistant erreichbar.
|
||||
- `docker ps` zeigt fuer `influxdb3-core` `192.168.178.58:8181->8181/tcp` oder den per `INFLUXDB_BIND_IP` gesetzten Host.
|
||||
- `ss -ltnp | grep 8181` zeigt einen Listener auf der gebundenen Host-IP.
|
||||
- `curl -i http://192.168.178.58:8181/` liefert ohne Token erwartbar `401 Unauthorized`.
|
||||
|
||||
## Drift-Check
|
||||
|
||||
Wenn Komodo, Gitea und Runtime nicht zusammenpassen, zuerst `docs/GITOPS_DRIFT_RUNBOOK.md` verwenden. Besonders wichtig:
|
||||
|
||||
```bash
|
||||
cd /mnt/user/services/stacks/grafana
|
||||
git rev-parse --short HEAD
|
||||
grep -nE "ports:|grafana_influx_lan|grafana_influx_internal" -A4 -B2 ops/grafana-influxdb/docker-compose.yml
|
||||
docker inspect influxdb3-core --format '{{json .NetworkSettings.Ports}}'
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" | grep influx
|
||||
ss -ltnp | grep 8181
|
||||
```
|
||||
|
||||
## Rollback
|
||||
|
||||
- Stack in Komodo stoppen oder Git auf den letzten Stand ohne `ops/grafana-influxdb` zuruecknehmen.
|
||||
- Persistente Daten liegen unter `/mnt/user/appdata/grafana` und `/mnt/user/appdata/influxdb3`; nicht automatisch loeschen.
|
||||
@@ -1,91 +0,0 @@
|
||||
services:
|
||||
grafana:
|
||||
image: grafana/grafana:12.4.3@sha256:2e986801428cd689c2358605289c90ab37d2b39e24808874971f54c99bcdc412
|
||||
container_name: grafana
|
||||
restart: unless-stopped
|
||||
user: "0"
|
||||
environment:
|
||||
GF_SERVER_ROOT_URL: https://grafana.kaleschke.info/
|
||||
GF_SECURITY_ADMIN_PASSWORD__FILE: /run/secrets/grafana_admin_password
|
||||
GF_USERS_ALLOW_SIGN_UP: "false"
|
||||
GF_AUTH_ANONYMOUS_ENABLED: "false"
|
||||
dns:
|
||||
- 1.1.1.1
|
||||
- 8.8.8.8
|
||||
entrypoint:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- |
|
||||
export GRAFANA_INFLUXDB_TOKEN="$$(cat /run/secrets/grafana_influxdb_token)"
|
||||
exec /run.sh
|
||||
volumes:
|
||||
- /mnt/user/appdata/grafana:/var/lib/grafana
|
||||
- /mnt/user/appdata/grafana/provisioning:/etc/grafana/provisioning:ro
|
||||
secrets:
|
||||
- grafana_admin_password
|
||||
- grafana_influxdb_token
|
||||
networks:
|
||||
- frontend_net
|
||||
- backend_net
|
||||
- grafana_influx_internal
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/api/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
labels:
|
||||
- traefik.enable=true
|
||||
- traefik.docker.network=frontend_net
|
||||
- traefik.http.routers.grafana.rule=Host(`grafana.kaleschke.info`)
|
||||
- traefik.http.routers.grafana.entrypoints=websecure
|
||||
- traefik.http.routers.grafana.tls=true
|
||||
- traefik.http.routers.grafana.tls.certresolver=le
|
||||
- traefik.http.routers.grafana.middlewares=authelia@file,secure-headers@file
|
||||
- traefik.http.services.grafana.loadbalancer.server.port=3000
|
||||
|
||||
influxdb3-core:
|
||||
image: influxdb:3.9.1-core@sha256:1d58c8b9ac90153ae3a020ede2810c8284933dda50ac71e7573389ab6f012128
|
||||
container_name: influxdb3-core
|
||||
restart: unless-stopped
|
||||
user: "0"
|
||||
ports:
|
||||
- "${INFLUXDB_BIND_IP:-127.0.0.1}:8181:8181"
|
||||
command:
|
||||
- influxdb3
|
||||
- serve
|
||||
- --node-id=kallilabcore
|
||||
- --object-store=file
|
||||
- --data-dir=/var/lib/influxdb3/data
|
||||
- --plugin-dir=/var/lib/influxdb3/plugins
|
||||
- --admin-token-file=/run/secrets/influxdb3_admin_token
|
||||
volumes:
|
||||
- /mnt/user/appdata/influxdb3/data:/var/lib/influxdb3/data
|
||||
- /mnt/user/appdata/influxdb3/plugins:/var/lib/influxdb3/plugins
|
||||
secrets:
|
||||
- influxdb3_admin_token
|
||||
networks:
|
||||
- grafana_influx_lan
|
||||
- grafana_influx_internal
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
|
||||
secrets:
|
||||
grafana_admin_password:
|
||||
file: /mnt/user/appdata/secrets/grafana_admin_password.txt
|
||||
influxdb3_admin_token:
|
||||
file: /mnt/user/appdata/secrets/influxdb3_admin_token.json
|
||||
grafana_influxdb_token:
|
||||
file: /mnt/user/appdata/secrets/grafana_influxdb_token.txt
|
||||
|
||||
networks:
|
||||
frontend_net:
|
||||
external: true
|
||||
backend_net:
|
||||
external: true
|
||||
grafana_influx_lan:
|
||||
driver: bridge
|
||||
grafana_influx_internal:
|
||||
internal: true
|
||||
@@ -1,23 +0,0 @@
|
||||
{
|
||||
"uid": "kallilab-container-error-rate",
|
||||
"title": "Container Error Rate",
|
||||
"schemaVersion": 39,
|
||||
"version": 1,
|
||||
"refresh": "5m",
|
||||
"time": { "from": "now-24h", "to": "now" },
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "table",
|
||||
"title": "Container Errors Last 24h",
|
||||
"datasource": { "type": "loki", "uid": "loki" },
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum by (container_name) (count_over_time({platform=\"docker\"} |~ \"(?i)(level=error|error|fatal|panic)\" [24h]))"
|
||||
}
|
||||
],
|
||||
"gridPos": { "h": 16, "w": 24, "x": 0, "y": 0 }
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -1,43 +0,0 @@
|
||||
{
|
||||
"uid": "kallilab-logs-last-60m",
|
||||
"title": "Last 60 min before now",
|
||||
"schemaVersion": 39,
|
||||
"version": 1,
|
||||
"refresh": "30s",
|
||||
"time": { "from": "now-60m", "to": "now" },
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"name": "container",
|
||||
"type": "query",
|
||||
"datasource": { "type": "loki", "uid": "loki" },
|
||||
"query": "label_values(container_name)",
|
||||
"includeAll": true,
|
||||
"allValue": ".+",
|
||||
"refresh": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "logs",
|
||||
"title": "Docker Log Stream",
|
||||
"datasource": { "type": "loki", "uid": "loki" },
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "{platform=\"docker\", container_name=~\"$container\"}"
|
||||
}
|
||||
],
|
||||
"gridPos": { "h": 20, "w": 24, "x": 0, "y": 0 },
|
||||
"options": {
|
||||
"showTime": true,
|
||||
"showLabels": true,
|
||||
"wrapLogMessage": false,
|
||||
"enableLogDetails": true,
|
||||
"sortOrder": "Descending"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -1,12 +0,0 @@
|
||||
apiVersion: 1
|
||||
|
||||
providers:
|
||||
- name: KalliLab Observability
|
||||
orgId: 1
|
||||
folder: KalliLab Observability
|
||||
type: file
|
||||
disableDeletion: false
|
||||
updateIntervalSeconds: 60
|
||||
allowUiUpdates: false
|
||||
options:
|
||||
path: /etc/grafana/provisioning/dashboards
|
||||
@@ -1,23 +0,0 @@
|
||||
{
|
||||
"uid": "kallilab-restart-events",
|
||||
"title": "Restart Events",
|
||||
"schemaVersion": 39,
|
||||
"version": 1,
|
||||
"refresh": "5m",
|
||||
"time": { "from": "now-24h", "to": "now" },
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "heatmap",
|
||||
"title": "Restart-like Log Events",
|
||||
"datasource": { "type": "loki", "uid": "loki" },
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum by (container_name) (count_over_time({platform=\"docker\"} |~ \"(?i)(restart|restarting|started|exited|oom)\" [5m]))"
|
||||
}
|
||||
],
|
||||
"gridPos": { "h": 16, "w": 24, "x": 0, "y": 0 }
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -1,26 +0,0 @@
|
||||
apiVersion: 1
|
||||
|
||||
prune: true
|
||||
|
||||
datasources:
|
||||
- name: InfluxDB 3 Core
|
||||
uid: influxdb3-core
|
||||
type: influxdb
|
||||
access: proxy
|
||||
url: http://influxdb3-core:8181
|
||||
isDefault: true
|
||||
jsonData:
|
||||
version: SQL
|
||||
dbName: homelab
|
||||
httpMode: POST
|
||||
insecureGrpc: true
|
||||
secureJsonData:
|
||||
token: $GRAFANA_INFLUXDB_TOKEN
|
||||
- name: Loki
|
||||
uid: loki
|
||||
type: loki
|
||||
access: proxy
|
||||
url: http://loki:3100
|
||||
isDefault: false
|
||||
jsonData:
|
||||
maxLines: 1000
|
||||
@@ -1,3 +0,0 @@
|
||||
# Safe default: local host only.
|
||||
# Set this to the Unraid LAN IP, for example 192.168.178.58, when a VM such as Home Assistant must write to InfluxDB.
|
||||
INFLUXDB_BIND_IP=127.0.0.1
|
||||
@@ -315,48 +315,41 @@
|
||||
"first_check": "HTTPS erreichbar? NTFY_BEHIND_PROXY=true gesetzt? Traefik healthy?",
|
||||
"notes": "KRITISCH: Ausfall bedeutet keine anderen Alerts ankommen"
|
||||
},
|
||||
"homepage": {
|
||||
"description": "Start-Dashboard",
|
||||
"glance": {
|
||||
"description": "Homelab-Dashboard",
|
||||
"tier": 3,
|
||||
"category": "ops",
|
||||
"container_name": "homepage",
|
||||
"container_name": "glance",
|
||||
"dependencies": ["traefik"],
|
||||
"url": "https://home.kaleschke.info",
|
||||
"url": "https://glance.kaleschke.info",
|
||||
"dump_file": null,
|
||||
"data_paths": ["/mnt/user/appdata/homepage"],
|
||||
"first_check": "Traefik erreichbar? Docker-Socket read-only lesbar? API-Tokens gueltig?",
|
||||
"notes": "Docker socket read-only; viele API Tokens in Config"
|
||||
"data_paths": [],
|
||||
"first_check": "Traefik erreichbar? Docker-Socket-Proxy intern erreichbar? API-Tokens fuer Widgets gueltig?",
|
||||
"notes": "aktives Homelab-Dashboard; Homepage wurde entfernt"
|
||||
},
|
||||
"uptime-kuma": {
|
||||
"description": "Monitoring / Uptime Checks",
|
||||
"monitoring-grafana": {
|
||||
"description": "Zentrale Observability-UI",
|
||||
"tier": 3,
|
||||
"category": "ops",
|
||||
"container_name": "UptimeKuma",
|
||||
"dependencies": ["traefik"],
|
||||
"url": "https://uptime.kaleschke.info",
|
||||
"container_name": "monitoring-grafana",
|
||||
"dependencies": [
|
||||
"monitoring-prometheus",
|
||||
"monitoring-loki",
|
||||
"monitoring-influxdb3-core",
|
||||
"traefik"
|
||||
],
|
||||
"url": "https://monitoring.kaleschke.info",
|
||||
"dump_file": null,
|
||||
"data_paths": ["/mnt/user/appdata/uptime-kuma"],
|
||||
"first_check": "Datenbank-Volume intakt? Traefik erreichbar?",
|
||||
"notes": "Monitore nach Restore manuell pruefen"
|
||||
"data_paths": ["grafana_data"],
|
||||
"first_check": "Authelia-Redirect? Datasources Prometheus, Loki und InfluxDB 3 Core gruen?",
|
||||
"notes": "ersetzt alten Grafana-Altstand und Uptime-Kuma-Views"
|
||||
},
|
||||
"grafana": {
|
||||
"description": "Metrik-Dashboard",
|
||||
"monitoring-influxdb3-core": {
|
||||
"description": "Zeitreihen- / Metrikdaten fuer Monitoring und Home Assistant",
|
||||
"tier": 3,
|
||||
"category": "ops",
|
||||
"container_name": "grafana",
|
||||
"dependencies": ["influxdb3-core", "traefik"],
|
||||
"url": "https://grafana.kaleschke.info",
|
||||
"dump_file": null,
|
||||
"data_paths": ["/mnt/user/appdata/grafana"],
|
||||
"first_check": "influxdb3-core healthy? Datasource-Token gesetzt? Provisioning-Konfig vorhanden?",
|
||||
"notes": "laeuft als user 0; Datasource wird provisioniert"
|
||||
},
|
||||
"influxdb3-core": {
|
||||
"description": "Zeitreihen- / Metrikdaten fuer Grafana und Home Assistant",
|
||||
"tier": 3,
|
||||
"category": "ops",
|
||||
"container_name": "influxdb3-core",
|
||||
"dependencies": [],
|
||||
"container_name": "monitoring-influxdb3-core",
|
||||
"dependencies": ["monitoring-grafana"],
|
||||
"url": null,
|
||||
"dump_file": null,
|
||||
"data_paths": [
|
||||
|
||||
@@ -395,55 +395,43 @@ services:
|
||||
# TIER 3 — Ops / Tools (Ausfall schmerzt, blockiert nichts Kritisches)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
homepage:
|
||||
description: Start-Dashboard
|
||||
glance:
|
||||
description: Homelab-Dashboard
|
||||
tier: 3
|
||||
category: ops
|
||||
container_name: homepage
|
||||
container_name: glance
|
||||
dependencies:
|
||||
- traefik
|
||||
url: https://home.kaleschke.info
|
||||
url: https://glance.kaleschke.info
|
||||
dump_file: null
|
||||
data_paths:
|
||||
- /mnt/user/appdata/homepage
|
||||
first_check: "Traefik erreichbar? Docker-Socket read-only lesbar? API-Tokens gueltig?"
|
||||
notes: "Docker socket read-only; viele API Tokens in Config"
|
||||
data_paths: []
|
||||
first_check: "Traefik erreichbar? Docker-Socket-Proxy intern erreichbar? API-Tokens fuer Widgets gueltig?"
|
||||
notes: "aktives Homelab-Dashboard; Homepage wurde entfernt"
|
||||
|
||||
uptime-kuma:
|
||||
description: Monitoring / Uptime Checks
|
||||
monitoring-grafana:
|
||||
description: Zentrale Observability-UI
|
||||
tier: 3
|
||||
category: ops
|
||||
container_name: UptimeKuma
|
||||
container_name: monitoring-grafana
|
||||
dependencies:
|
||||
- monitoring-prometheus
|
||||
- monitoring-loki
|
||||
- monitoring-influxdb3-core
|
||||
- traefik
|
||||
url: https://uptime.kaleschke.info
|
||||
url: https://monitoring.kaleschke.info
|
||||
dump_file: null
|
||||
data_paths:
|
||||
- /mnt/user/appdata/uptime-kuma
|
||||
first_check: "Datenbank-Volume intakt? Traefik erreichbar?"
|
||||
notes: "Monitore nach Restore manuell pruefen"
|
||||
- grafana_data
|
||||
first_check: "Authelia-Redirect? Datasources Prometheus, Loki und InfluxDB 3 Core gruen?"
|
||||
notes: "ersetzt alten Grafana-Altstand und Uptime-Kuma-Views"
|
||||
|
||||
grafana:
|
||||
description: Metrik-Dashboard
|
||||
monitoring-influxdb3-core:
|
||||
description: Zeitreihen- / Metrikdaten fuer Monitoring und Home Assistant
|
||||
tier: 3
|
||||
category: ops
|
||||
container_name: grafana
|
||||
container_name: monitoring-influxdb3-core
|
||||
dependencies:
|
||||
- influxdb3-core
|
||||
- traefik
|
||||
url: https://grafana.kaleschke.info
|
||||
dump_file: null
|
||||
data_paths:
|
||||
- /mnt/user/appdata/grafana
|
||||
first_check: "influxdb3-core healthy? Datasource-Token in Secret gesetzt? Provisioning-Konfig vorhanden?"
|
||||
notes: "laeuft als user 0 wegen Host-Appdata-Permissions (dokumentiert); Datasource wird provisioniert"
|
||||
|
||||
influxdb3-core:
|
||||
description: Zeitreihen- / Metrikdaten fuer Grafana und Home Assistant
|
||||
tier: 3
|
||||
category: ops
|
||||
container_name: influxdb3-core
|
||||
dependencies: []
|
||||
- monitoring-grafana
|
||||
url: null
|
||||
dump_file: null
|
||||
data_paths:
|
||||
|
||||
@@ -1,31 +0,0 @@
|
||||
# Loki / Alloy
|
||||
|
||||
Status: abgeloester Altstand. Der zentrale Zielzustand ist `monitoring/` mit `monitoring-loki` und `monitoring-promtail`.
|
||||
|
||||
Internal logging stack for KalliLab CORE.
|
||||
|
||||
## Services
|
||||
|
||||
- `loki`: internal log store on `backend_net`, no Traefik route, `auth_enabled: false` because access is limited to internal Docker networking.
|
||||
- `alloy`: Docker log collector. It mounts `/var/run/docker.sock:ro` as a documented observability exception and forwards Docker container logs to Loki.
|
||||
|
||||
## Migration note
|
||||
|
||||
Do not run this stack in parallel with `monitoring/` unless you are deliberately comparing collectors during migration. After `monitoring-loki` and `monitoring-promtail` are live and Grafana can query logs, stop this Komodo stack and keep the files only as rollback reference.
|
||||
|
||||
## Host sync
|
||||
|
||||
Before first deploy, sync the checked-in config files to appdata:
|
||||
|
||||
```bash
|
||||
mkdir -p /mnt/user/appdata/loki/config /mnt/user/appdata/loki/data
|
||||
mkdir -p /mnt/user/appdata/alloy/config /mnt/user/appdata/alloy/data
|
||||
cp /mnt/user/services/homelab-infra/ops/loki/config/loki-config.yml /mnt/user/appdata/loki/config/loki-config.yml
|
||||
cp /mnt/user/services/homelab-infra/ops/loki/config/config.alloy /mnt/user/appdata/alloy/config/config.alloy
|
||||
chown -R 10001:10001 /mnt/user/appdata/loki/data
|
||||
chmod 644 /mnt/user/appdata/loki/config/loki-config.yml /mnt/user/appdata/alloy/config/config.alloy
|
||||
```
|
||||
|
||||
## Restore posture
|
||||
|
||||
Loki data is transient operational telemetry. Docker raw logs remain the first fallback, Loki chunks on disk are a convenience cache, and ntfy critical events provide the external first-crash marker.
|
||||
@@ -1,43 +0,0 @@
|
||||
discovery.docker "containers" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
}
|
||||
|
||||
discovery.relabel "docker_logs" {
|
||||
targets = []
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_docker_container_name"]
|
||||
regex = "/(.*)"
|
||||
target_label = "container_name"
|
||||
}
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_docker_container_label_com_docker_compose_project"]
|
||||
target_label = "compose_project"
|
||||
}
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
|
||||
target_label = "compose_service"
|
||||
}
|
||||
}
|
||||
|
||||
loki.source.docker "containers" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
targets = discovery.docker.containers.targets
|
||||
labels = { platform = "docker", host = "kallilabcore" }
|
||||
relabel_rules = discovery.relabel.docker_logs.rules
|
||||
forward_to = [loki.process.docker.receiver]
|
||||
}
|
||||
|
||||
loki.process "docker" {
|
||||
forward_to = [loki.write.local.receiver]
|
||||
|
||||
stage.docker {}
|
||||
}
|
||||
|
||||
loki.write "local" {
|
||||
endpoint {
|
||||
url = "http://loki:3100/loki/api/v1/push"
|
||||
}
|
||||
}
|
||||
@@ -1,47 +0,0 @@
|
||||
auth_enabled: false
|
||||
|
||||
server:
|
||||
http_listen_port: 3100
|
||||
grpc_listen_port: 9096
|
||||
|
||||
common:
|
||||
instance_addr: 127.0.0.1
|
||||
path_prefix: /loki
|
||||
storage:
|
||||
filesystem:
|
||||
chunks_directory: /loki/chunks
|
||||
rules_directory: /loki/rules
|
||||
replication_factor: 1
|
||||
ring:
|
||||
kvstore:
|
||||
store: inmemory
|
||||
|
||||
query_range:
|
||||
results_cache:
|
||||
cache:
|
||||
embedded_cache:
|
||||
enabled: true
|
||||
max_size_mb: 100
|
||||
|
||||
schema_config:
|
||||
configs:
|
||||
- from: 2026-05-16
|
||||
store: tsdb
|
||||
object_store: filesystem
|
||||
schema: v13
|
||||
index:
|
||||
prefix: index_
|
||||
period: 24h
|
||||
|
||||
limits_config:
|
||||
retention_period: 720h
|
||||
allow_structured_metadata: true
|
||||
ingestion_rate_mb: 16
|
||||
ingestion_burst_size_mb: 32
|
||||
|
||||
compactor:
|
||||
working_directory: /loki/compactor
|
||||
compaction_interval: 10m
|
||||
retention_enabled: true
|
||||
retention_delete_delay: 2h
|
||||
delete_request_store: filesystem
|
||||
@@ -1,36 +0,0 @@
|
||||
services:
|
||||
loki:
|
||||
image: grafana/loki:3.7.2@sha256:191d4fdfb7264f16989f0a57f320872620a5a7c2ceeec6229212c4190ec49b86
|
||||
container_name: loki
|
||||
restart: unless-stopped
|
||||
command:
|
||||
- -config.file=/etc/loki/loki-config.yml
|
||||
volumes:
|
||||
- /mnt/user/appdata/loki/config:/etc/loki:ro
|
||||
- /mnt/user/appdata/loki/data:/loki
|
||||
networks:
|
||||
- backend_net
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
alloy:
|
||||
image: grafana/alloy:v1.16.1@sha256:51aeb9d829239345070619dad3edd6873186f913c84f45b365b74574fcb38ec0
|
||||
container_name: alloy
|
||||
restart: unless-stopped
|
||||
command:
|
||||
- run
|
||||
- /etc/alloy/config.alloy
|
||||
- --storage.path=/var/lib/alloy/data
|
||||
volumes:
|
||||
- /mnt/user/appdata/alloy/config:/etc/alloy:ro
|
||||
- /mnt/user/appdata/alloy/data:/var/lib/alloy/data
|
||||
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||
networks:
|
||||
- backend_net
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
depends_on:
|
||||
- loki
|
||||
|
||||
networks:
|
||||
backend_net:
|
||||
external: true
|
||||
@@ -21,9 +21,6 @@
|
||||
"gitea": [
|
||||
"222:22"
|
||||
],
|
||||
"influxdb3-core": [
|
||||
"${INFLUXDB_BIND_IP:-127.0.0.1}:8181:8181"
|
||||
],
|
||||
"monitoring-influxdb3-core": [
|
||||
"${INFLUXDB_BIND_IP:-127.0.0.1}:8181:8181"
|
||||
],
|
||||
@@ -33,8 +30,6 @@
|
||||
]
|
||||
},
|
||||
"allowed_root_identities": [
|
||||
"grafana",
|
||||
"influxdb3-core",
|
||||
"monitoring-influxdb3-core"
|
||||
],
|
||||
"allowed_privileged_identities": [
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
# Policy Check Report
|
||||
|
||||
## Summary
|
||||
- Compose files checked: 31
|
||||
- Compose files checked: 29
|
||||
- Critical findings: 0
|
||||
- Warnings: 7
|
||||
- Info findings: 10
|
||||
- Warnings: 5
|
||||
- Info findings: 9
|
||||
|
||||
## Critical
|
||||
- none
|
||||
@@ -14,8 +14,6 @@
|
||||
- [IMAGE001] infra\ddns-updater\docker-compose.yml :: ddns-updater: Image uses a latest tag. Prefer a concrete version tag, even when a digest is present.
|
||||
- [USER001] monitoring\docker-compose.yml :: influxdb3-core: Runs as user 0. Documented exception, keep visible for hardening.
|
||||
- [IMAGE001] ops\glances\docker-compose.yml :: glances: Image uses a latest tag. Prefer a concrete version tag, even when a digest is present.
|
||||
- [USER001] ops\grafana-influxdb\docker-compose.yml :: grafana: Runs as user 0. Documented exception, keep visible for hardening.
|
||||
- [USER001] ops\grafana-influxdb\docker-compose.yml :: influxdb3-core: Runs as user 0. Documented exception, keep visible for hardening.
|
||||
- [IMAGE001] ops\scrutiny\docker-compose.yml :: scrutiny: Image uses a latest tag. Prefer a concrete version tag, even when a digest is present.
|
||||
|
||||
## Info
|
||||
@@ -25,7 +23,6 @@
|
||||
- [PORT001] host-services\Adguard\docker-compose.yml :: adguard: Allowed host port mapping: 100.80.98.33:8082:80
|
||||
- [HOSTNET001] host-services\tailscale\docker-compose.yml :: tailscale: network_mode: host is a documented exception.
|
||||
- [PORT001] monitoring\docker-compose.yml :: influxdb3-core: Allowed host port mapping: ${INFLUXDB_BIND_IP:-127.0.0.1}:8181:8181
|
||||
- [PORT001] ops\grafana-influxdb\docker-compose.yml :: influxdb3-core: Allowed host port mapping: ${INFLUXDB_BIND_IP:-127.0.0.1}:8181:8181
|
||||
- [PRIV001] ops\scrutiny\docker-compose.yml :: scrutiny: Privileged mode is a documented exception.
|
||||
- [PORT001] traefik\docker-compose.yml :: traefik: Allowed host port mapping: 80:80
|
||||
- [PORT001] traefik\docker-compose.yml :: traefik: Allowed host port mapping: 443:443
|
||||
|
||||
Reference in New Issue
Block a user