feat(telemetry): add per-app metrics and log telemetry to hub reports (v0.28.0)

- New internal/metrics/telemetry.go: MetricsStore.GetContainerTelemetry()
  aggregates container memory/CPU from SQLite over the last 15 min
- New internal/metrics/logscanner.go: ScanContainerLogs() scans docker logs
  for errors/warnings, deduplicates via fingerprinting (strips timestamps,
  replaces 6+ digit numbers, hex strings, UUIDs)
- New internal/report/telemetry.go: buildAppTelemetrySection() assembles
  per-stack AppTelemetry by aggregating container metrics and log summaries
- internal/report/types.go: added AppTelemetry field to Report struct plus
  AppTelemetry type with memory/CPU/log fields and LogIssue references
- internal/report/builder.go: calls buildAppTelemetrySection() in BuildReport()
- Backward-compatible: old Hub versions silently ignore app_telemetry field

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 10:46:27 +01:00
parent 981c473d57
commit 05ecd65412
7 changed files with 449 additions and 4 deletions
+19 -1
View File
@@ -4,7 +4,7 @@
A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.
**Current version: v0.27.2**
**Current version: v0.28.0**
---
@@ -846,9 +846,27 @@ Periodic JSON push (default every 15 min) to the central felhom-hub service:
- Health: current status, issues, warnings
- Stacks: deployed apps with versions and states
- Config hash: SHA256 of `controller.yaml` for Hub-side config comparison
- **App telemetry** (v0.28.0+): Per-stack memory (current/avg/peak) and CPU averages from the last 15 minutes of metrics data, plus log scan results (error/warning counts with deduplicated issues). Only non-protected, deployed stacks are included. Backward-compatible: old Hub versions silently ignore this field.
Bearer token authentication, 3-attempt retry with 5-second backoff. Push status tracked via `PushStatus` struct (LastAttempt, LastSuccess, LastError, consecutive failures) — used by the monitoring page and alert system to show Hub connection health.
#### App Telemetry (`internal/metrics/telemetry.go`, `internal/metrics/logscanner.go`, `internal/report/telemetry.go`)
Each report push now includes per-app telemetry data:
**Metrics collection** (`telemetry.go`):
- `MetricsStore.GetContainerTelemetry(since)` aggregates container-level memory (avg, peak, current) and CPU averages from the `container_metrics` SQLite table for the last 15 minutes.
**Log scanning** (`logscanner.go`):
- `ScanContainerLogs(containerNames, since, logger)` runs `docker logs --since=15m --tail=1000` sequentially on all non-protected deployed containers.
- Classifies lines by keyword match (errors: `error`, `fatal`, `panic`, `crit`, `oom`, `killed`, `exception`, `traceback`; warnings: `warn`, `warning`) on the first 5 words (case-insensitive).
- Deduplicates via fingerprinting: strips timestamps, replaces 6+ digit numbers with `<N>`, 8+ char hex with `<HEX>`, UUIDs with `<UUID>`. Groups identical fingerprints, keeps top 10 per container.
- Returns `[]ContainerLogSummary` with `ErrorCount`, `WarnCount`, `RecentIssues []LogIssue`.
**Report integration** (`report/telemetry.go`):
- `buildAppTelemetrySection()` calls both, then `buildAppTelemetry()` aggregates by stack — summing container metrics, merging issues, capping at 10 per app.
- Results stored as `[]AppTelemetry` in the `Report` struct field `app_telemetry`.
#### Infrastructure Backup to Hub (`internal/report/infra_backup.go`)
After each backup cycle (including manual Tier 2 triggers via `OnCrossDriveComplete` callback), the controller pushes a full infrastructure snapshot to the Hub for disaster recovery. This snapshot includes: