diff --git a/TASK.md b/TASK.md index 4d3d08c..92fc76b 100644 --- a/TASK.md +++ b/TASK.md @@ -1,918 +1,234 @@ -# TASK.md — v0.5.0: Backup Bugfixes + Monitoring Page with Metrics Store +# TASK.md — v0.5.1: Monitoring Page Bugfixes -> Version bump: **v0.5.0** -> Scope: 2 backup fixes + new metrics subsystem + new monitoring page +> Version bump: **v0.5.1** +> Scope: 4 bugs in the monitoring page --- -## Overview - -1. **Bugfix**: "Helyi mentés" shows "–" after controller restart (in-memory `LastBackup` lost) -2. **Performance**: Backup page caching — `GetFullStatus()` calls restic/docker on every page load (3-4s) -3. **Feature**: New **Rendszermonitor** (System Monitor) page with historical metrics stored in SQLite, rendered with Chart.js - ---- - -## Task 1: Fix "Helyi mentés" status after restart +## Bug 1: Hostname shows container ID instead of host hostname ### Problem -After controller restart, `LastBackup` is nil (in-memory only). The template checks `{{if .Backup.LastBackup}}` and falls through to the "–" state. However, `snapshotHistory` IS loaded on startup from `LoadSnapshotHistory()` — so we know backups exist. +"Gépnév" displays `75f2f2a113f3` — the Docker container ID. `os.Hostname()` inside a container returns the container's hostname, not the host's. + +### Root cause + +`sysinfo.go` line: `info.Hostname, _ = os.Hostname()` + +Inside a Docker container, `os.Hostname()` returns the container ID unless `hostname:` is set in docker-compose.yml. + +### Fix — Two options (use both for robustness) + +**Option A: Mount host's /etc/hostname** (preferred — works for all cases): + +In `controller/docker-compose.yml`, add: +```yaml +volumes: + - /etc/hostname:/host/etc/hostname:ro +``` + +In `sysinfo.go`, read host hostname first: +```go +// Hostname — try host mount first, fall back to os.Hostname() +if data, err := os.ReadFile("/host/etc/hostname"); err == nil { + info.Hostname = strings.TrimSpace(string(data)) +} else { + info.Hostname, _ = os.Hostname() +} +``` + +**Option B: Set hostname in docker-compose.yml** (simpler but requires per-customer config): + +```yaml +hostname: ${HOSTNAME:-felhom} +``` + +But this requires the env var to be set. Option A is better — it reads the actual host hostname dynamically. + +**Use Option A.** It's consistent with the `/etc/os-release` mount pattern already in place. + +--- + +## Bug 2: Tooltip timestamps show "1970. 01. 01. 01:00" + +### Problem + +Hovering over chart data points shows `1970. 01. 01. 01:00` instead of the actual timestamp. + +### Root cause + +In the tooltip callback: +```javascript +callbacks: { + title: function(items) { + if (!items.length) return ''; + return formatTimestamp(items[0].parsed.x || items[0].label); + } +} +``` + +The chart uses a **category** x-axis (default), not a time axis. `items[0].parsed.x` returns the **category index** (0, 1, 2, 3...), not the timestamp. When the index is > 0, `parsed.x || label` evaluates to the index (truthy). Then `formatTimestamp(5)` does `new Date(5 * 1000)` → `1970-01-01 01:00:00.005`. + +When the index is 0, `0 || label` falls through to `label`, which works correctly. That's why the first data point shows the right time. ### Fix -In `GetFullStatus()`, after copying snapshot history, synthesize a `LastBackup` if it's nil but snapshots exist: +Always use `items[0].label` instead of `parsed.x`: -```go -// After m.mu.Unlock() and snapshot reversal... - -// Synthesize LastBackup from snapshot history if not in memory (e.g., after restart) -if status.LastBackup == nil && len(status.SnapshotHistory) > 0 { - latest := status.SnapshotHistory[0] // already reversed, newest first - status.LastBackup = &BackupStatus{ - LastRun: latest.Time, - Success: latest.Success, - Snapshot: &SnapshotResult{ - SnapshotID: latest.SnapshotID, - }, +```javascript +callbacks: { + title: function(items) { + if (!items.length) return ''; + return formatTimestamp(items[0].label); } } ``` -Also do the same for `LastDBDump` — synthesize from `DumpFiles` on disk: +`items[0].label` is the raw label value from the labels array, which IS the timestamp in milliseconds. -```go -if status.LastDBDump == nil && len(status.DumpFiles) > 0 { - var results []DumpResult - var latestTime time.Time - for _, f := range status.DumpFiles { - results = append(results, DumpResult{ - DB: DiscoveredDB{StackName: f.StackName, DBType: f.DBType, ContainerName: f.StackName}, - FilePath: f.FileName, - Size: f.Size, - }) - if f.ModTime.After(latestTime) { - latestTime = f.ModTime - } - } - status.LastDBDump = &DBDumpStatus{ - LastRun: latestTime, - Results: results, - Success: true, - } -} -``` +### Files -### Verification - -Restart the controller (`docker compose restart`), then load the backup page. "Helyi mentés" should show green checkmark "aktív" (not "–"). +`internal/web/templates/monitoring.html` — tooltip callback in `chartOpts` function. --- -## Task 2: Backup page caching +## Bug 3: Range selector appears non-functional / 24h shows empty ### Problem -`GetFullStatus()` runs `restic stats --json`, `restic snapshots --json`, `docker ps`, and `docker inspect` synchronously on **every page load** of `/backups`. Takes 3-4 seconds. +Default range is `24h` but the system has only ~20 minutes of data. On page load, charts appear empty (Y-axis 0-1.0, no visible lines). Clicking "1 óra" shows data. User perceives buttons as "not doing anything" because the initial state is already broken. -### Fix: Background cache with periodic refresh - +### Root cause (likely) -Add cached status to Manager: +Two contributing factors: -```go -type Manager struct { - // ... existing fields ... - cachedStatus *FullBackupStatus - cacheTime time.Time -} +1. **Default range too wide**: `systemRange = '24h'` — for a newly deployed system with minutes of data, this either shows nothing or shows a barely visible sliver at the right edge. + +2. **Downsampling compression**: 24h range with resolution=200 → `bucketSeconds = 432`. Twenty data points spanning 20 minutes (~1200s) get grouped into ~3 buckets. Three data points CAN render as a line chart, but if Chart.js's auto-scaling or the bucket timestamps are at the very edge, the chart might not render visibly. + +### Fix + +**A. Change default range to `1h`:** + +```javascript +let systemRange = '1h'; ``` -New method: - -```go -// RefreshCache updates the cached backup status in the background. -// Called by scheduler every 5 minutes and after each backup run. -func (m *Manager) RefreshCache(nextDBDump, nextBackup time.Time) { - // Execute all the expensive calls (restic stats, docker ps, etc.) - // Same logic currently in GetFullStatus()... - status := &FullBackupStatus{ ... } - - m.mu.Lock() - m.cachedStatus = status - m.cacheTime = time.Now() - m.mu.Unlock() -} -``` - -Modified `GetFullStatus()` — reads from cache, updates only cheap dynamic fields: - -```go -func (m *Manager) GetFullStatus(nextDBDump, nextBackup time.Time) *FullBackupStatus { - m.mu.Lock() - defer m.mu.Unlock() - - if m.cachedStatus != nil { - // Update dynamic fields without subprocess calls - m.cachedStatus.Running = m.running - m.cachedStatus.NextDBDump = nextDBDump - m.cachedStatus.NextBackup = nextBackup - m.cachedStatus.LastDBDump = m.lastDBDump - m.cachedStatus.LastBackup = m.lastBackup - m.cachedStatus.SnapshotHistory = make([]SnapshotRecord, len(m.snapshotHistory)) - copy(m.cachedStatus.SnapshotHistory, m.snapshotHistory) - // Apply snapshot reversal + LastBackup synthesis (from Task 1) - return m.cachedStatus - } - - // No cache yet — return minimal status (before first refresh completes) - return &FullBackupStatus{ - Enabled: m.cfg.Backup.Enabled, - Running: m.running, - LastDBDump: m.lastDBDump, - LastBackup: m.lastBackup, - DBDumpSchedule: m.cfg.Backup.DBDumpSchedule, - ResticSchedule: m.cfg.Backup.ResticSchedule, - PruneSchedule: m.cfg.Backup.PruneSchedule, - NextDBDump: nextDBDump, - NextBackup: nextBackup, - Retention: m.cfg.Backup.Retention, - RepoPath: m.cfg.Backup.ResticRepo, - BackupPaths: []string{m.cfg.Paths.StacksDir, m.cfg.Paths.DBDumpDir, "/opt/docker/felhom-controller/controller.yaml"}, - } -} -``` - -### Scheduler + lifecycle integration - -In `main.go`: - -```go -// Register cache refresh job (every 5 min) -sched.Every("backup-cache", 5*time.Minute, func(ctx context.Context) error { - nextDBDump := scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule) - nextBackup := scheduler.NextDailyRun(cfg.Backup.ResticSchedule) - backupMgr.RefreshCache(nextDBDump, nextBackup) - return nil -}) - -// Initial cache population (non-blocking) -go func() { - backupMgr.RefreshCache( - scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule), - scheduler.NextDailyRun(cfg.Backup.ResticSchedule), - ) -}() -``` - -At the end of `RunFullBackup()` and `RunBackup()`, call `RefreshCache()` so the page shows updated data immediately. - -**Import note**: `RefreshCache` takes `nextDBDump, nextBackup time.Time` as params to avoid circular import with scheduler package. - -### Add `CacheTime` to FullBackupStatus - -```go -type FullBackupStatus struct { - // ... existing ... - CacheTime time.Time // when cache was last refreshed -} -``` - -Template can optionally show staleness at the bottom of the page in muted text. - ---- - -## Task 3: Monitoring Page — Metrics Store + Charts - -### Architecture - -``` - ┌──────────────────────────┐ - │ felhom-controller │ - │ │ - /proc/stat ────────────►│ MetricsCollector │ - /proc/meminfo ─────────►│ (every 60s) │ - /sys/thermal ──────────►│ ↓ │ - docker stats ──────────►│ SQLite DB │ - │ (/data/metrics.db) │ - │ ↓ │ - │ REST API │ - │ /api/metrics/* │ - │ ↓ │ - │ Chart.js in browser │ - └──────────────────────────┘ -``` - -No new containers. Everything runs inside the existing controller. - -### 3A: SQLite Metrics Store (`internal/metrics/store.go`) - -**Database**: `/opt/docker/felhom-controller/data/metrics.db` (inside the controller-data volume — persists across restarts) - -**Tables**: - -```sql --- System-wide metrics (1 row per sample) -CREATE TABLE IF NOT EXISTS system_metrics ( - ts INTEGER NOT NULL, -- Unix timestamp - cpu_percent REAL NOT NULL, - mem_used_mb INTEGER NOT NULL, - mem_total_mb INTEGER NOT NULL, - temp_celsius REAL, - load_avg_1 REAL, - load_avg_5 REAL, - load_avg_15 REAL, - disk_used_gb REAL, - disk_total_gb REAL, - hdd_used_gb REAL, - hdd_total_gb REAL -); -CREATE INDEX IF NOT EXISTS idx_system_ts ON system_metrics(ts); - --- Per-container metrics (1 row per container per sample) -CREATE TABLE IF NOT EXISTS container_metrics ( - ts INTEGER NOT NULL, -- Unix timestamp - container_name TEXT NOT NULL, - cpu_percent REAL NOT NULL, - mem_usage_mb REAL NOT NULL, - mem_limit_mb REAL, - net_rx_bytes INTEGER, - net_tx_bytes INTEGER, - block_read_bytes INTEGER, - block_write_bytes INTEGER -); -CREATE INDEX IF NOT EXISTS idx_container_ts ON container_metrics(ts); -CREATE INDEX IF NOT EXISTS idx_container_name ON container_metrics(container_name, ts); -``` - -**Go struct**: - -```go -type MetricsStore struct { - db *sql.DB - logger *log.Logger -} - -func NewMetricsStore(dbPath string, logger *log.Logger) (*MetricsStore, error) -func (s *MetricsStore) Close() error -func (s *MetricsStore) InsertSystemMetrics(m SystemSample) error -func (s *MetricsStore) InsertContainerMetrics(samples []ContainerSample) error -func (s *MetricsStore) QuerySystemMetrics(from, to time.Time, resolution int) ([]SystemSample, error) -func (s *MetricsStore) QueryContainerMetrics(name string, from, to time.Time, resolution int) ([]ContainerSample, error) -func (s *MetricsStore) QueryContainerSummary() ([]ContainerCurrentStats, error) -func (s *MetricsStore) Prune(olderThan time.Duration) (int64, error) -``` - -**Resolution/downsampling**: The `resolution` parameter controls how many data points to return. For example, resolution=100 means "return ~100 points, averaging intermediate samples". Implementation: compute bucket size as `(to-from)/resolution`, group by `ts/bucketSeconds`, average the values. - -**Auto-prune**: Delete rows older than 30 days. Run daily via scheduler. - -**SQLite pragmas on open**: -```sql -PRAGMA journal_mode=WAL; -PRAGMA synchronous=NORMAL; -PRAGMA busy_timeout=5000; -``` - -WAL mode is critical — allows concurrent reads (page loads) while writer (collector) inserts. Without WAL, page loads would block during writes. - -**Dependencies**: Use `modernc.org/sqlite` (pure Go, no CGO — works in the existing Alpine-based Docker image without extra libs). Add to `go.mod`: -``` -go get modernc.org/sqlite -``` - -If `modernc.org/sqlite` is problematic (large binary size increase), alternative: use `github.com/mattn/go-sqlite3` but this requires CGO and the Dockerfile would need `gcc` + `musl-dev` in the build stage. Prefer `modernc.org/sqlite` for simplicity. - -### 3B: Metrics Collector (`internal/metrics/collector.go`) - -A single goroutine that samples both system and container metrics every 60 seconds: - -```go -type MetricsCollector struct { - store *MetricsStore - cpuCollector *system.CPUCollector - hddPath string - logger *log.Logger - cancel context.CancelFunc -} - -func NewMetricsCollector(store *MetricsStore, cpuCollector *system.CPUCollector, hddPath string, logger *log.Logger) *MetricsCollector -func (c *MetricsCollector) Start(ctx context.Context) -func (c *MetricsCollector) Stop() -``` - -**System sampling** (reuse existing functions): -```go -func (c *MetricsCollector) sampleSystem() SystemSample { - info := system.GetInfo(c.hddPath, c.cpuCollector) - return SystemSample{ - Timestamp: time.Now().Unix(), - CPUPercent: info.CPUPercent, - MemUsedMB: int(info.UsedMemMB), - MemTotalMB: int(info.TotalMemMB), - TempCelsius: info.TemperatureCelsius, - LoadAvg1: info.LoadAvg1, - LoadAvg5: info.LoadAvg5, - LoadAvg15: info.LoadAvg15, - DiskUsedGB: info.DiskUsedGB, - DiskTotalGB: info.DiskTotalGB, - HDDUsedGB: info.HDDUsedGB, - HDDTotalGB: info.HDDTotalGB, - } -} -``` - -**Container sampling** via `docker stats --no-stream`: - -```go -func (c *MetricsCollector) sampleContainers() []ContainerSample { - ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) - defer cancel() - - cmd := exec.CommandContext(ctx, "docker", "stats", "--no-stream", - "--format", "{{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}") - out, err := cmd.Output() - // Parse each line... -} -``` - -Parse the output fields. `docker stats` returns values like: -- CPU%: "2.50%" → parse as float 2.50 -- MemUsage: "150.5MiB / 512MiB" → parse numerator as MB -- NetIO: "1.5MB / 2.3MB" → parse rx/tx bytes -- BlockIO: "50MB / 10MB" → parse read/write bytes - -**Filter out**: Skip infrastructure containers if desired (felhom-controller itself — avoid self-referential metrics noise). Actually, include all containers including infra — useful for debugging. - -**Collector loop**: -```go -func (c *MetricsCollector) loop(ctx context.Context) { - ticker := time.NewTicker(60 * time.Second) - defer ticker.Stop() - - // Sample immediately on start - c.sample() - - for { - select { - case <-ctx.Done(): - return - case <-ticker.C: - c.sample() - } - } -} - -func (c *MetricsCollector) sample() { - sys := c.sampleSystem() - if err := c.store.InsertSystemMetrics(sys); err != nil { - c.logger.Printf("[WARN] Failed to store system metrics: %v", err) - } - - containers := c.sampleContainers() - if err := c.store.InsertContainerMetrics(containers); err != nil { - c.logger.Printf("[WARN] Failed to store container metrics: %v", err) - } -} -``` - -### 3C: System Info Provider (`internal/metrics/sysinfo.go`) - -Static system information (doesn't change between samples): - -```go -type StaticSystemInfo struct { - Hostname string - OS string // e.g., "Debian GNU/Linux 12 (bookworm)" - Kernel string // e.g., "6.1.0-18-amd64" - Architecture string // e.g., "x86_64" - CPUModel string // e.g., "Intel N100" - CPUCores int // e.g., 4 - Uptime time.Duration - UptimeSince time.Time -} - -func GetStaticInfo() StaticSystemInfo -``` - -Read from: -- Hostname: `os.Hostname()` or `/etc/hostname` -- OS: `/etc/os-release` → `PRETTY_NAME` -- Kernel: `/proc/version` or `uname -r` (read `/proc/sys/kernel/osrelease`) -- Architecture: `runtime.GOARCH` (but for display, read `/proc/cpuinfo` or use `uname -m`) -- CPU model: `/proc/cpuinfo` → `model name` field -- CPU cores: `/proc/cpuinfo` → count `processor` lines, or `runtime.NumCPU()` -- Uptime: `/proc/uptime` → first field (seconds since boot) - -**IMPORTANT**: These are read from *inside the container*. `/proc/uptime`, `/proc/cpuinfo`, and `/proc/version` reflect the **host** (not the container) because they're from the host kernel. `/etc/os-release` inside the container shows the container's OS (Debian), not the host's. For displaying host OS: -- Mount `/etc/os-release` read-only from host: add to docker-compose.yml -- Or: Read from `/host/etc/os-release` with a `/etc:/host/etc:ro` mount - -**Decision**: Add `/etc/os-release:/host/etc/os-release:ro` to docker-compose.yml. Read host OS from `/host/etc/os-release`, fall back to container's `/etc/os-release`. - -### 3D: REST API Endpoints (`internal/api/router.go`) - -New endpoints: - -``` -GET /api/metrics/system?range=24h&resolution=200 -GET /api/metrics/system?from=2026-02-15T00:00:00Z&to=2026-02-16T00:00:00Z&resolution=200 -GET /api/metrics/containers/summary -GET /api/metrics/containers/{name}?range=7d&resolution=200 -GET /api/metrics/sysinfo -``` - -**Range presets**: `1h`, `6h`, `24h`, `7d`, `30d` — parsed in the handler and converted to `from`/`to` timestamps. - -**Response format** for system metrics: -```json -{ - "ok": true, - "data": { - "labels": [1708041600, 1708041660, ...], - "cpu": [5.2, 4.8, 6.1, ...], - "memory": [3200, 3250, 3180, ...], - "temp": [42, 41, 43, ...], - "load1": [0.3, 0.2, 0.4, ...] - } -} -``` - -Flat arrays are more efficient for Chart.js than arrays of objects. - -**Response format** for container summary: -```json -{ - "ok": true, - "data": [ - {"name": "immich-server", "cpu_percent": 2.5, "mem_usage_mb": 350, "mem_limit_mb": 2048}, - {"name": "paperless-webserver", "cpu_percent": 0.1, "mem_usage_mb": 280, "mem_limit_mb": 1024} - ] -} -``` - -### 3E: Monitoring Page Template (`internal/web/templates/monitoring.html`) - -Four sections: - -#### Section 1: Rendszer áttekintés (System Overview) - -Static system info card: - -``` -╔══════════════════════════════════════════════════════╗ -║ Rendszer áttekintés ║ -╠══════════════════════════════════════════════════════╣ -║ ║ -║ Gépnév: demo-felhom ║ -║ Operációs rendszer: Debian GNU/Linux 12 (bookworm) ║ -║ Kernel: 6.1.0-18-amd64 ║ -║ Processzor: Intel N100 (4 mag) ║ -║ Üzemidő: 15 nap, 3 óra ║ -║ Indítás: 2026-02-01 05:12 ║ -║ ║ -╚══════════════════════════════════════════════════════╝ -``` - -Hungarian labels: -- "Rendszer áttekintés" = System overview -- "Gépnév" = Hostname -- "Operációs rendszer" = Operating system -- "Kernel" = Kernel -- "Processzor" = Processor -- "mag" = cores -- "Üzemidő" = Uptime -- "nap" = days, "óra" = hours -- "Indítás" = Started at - -#### Section 2: Rendszer metrikák (System Metrics) — Charts - -Time range selector (pill buttons like filter bar): - -``` -[ 1 óra ] [ 6 óra ] [ 24 óra ] [ 7 nap ] [ 30 nap ] -``` - -Four charts in a 2×2 grid: - -``` -┌─────────────────────────┐ ┌─────────────────────────┐ -│ CPU használat (%) │ │ Memória használat (GB) │ -│ ▁▂▃▂▁▂▄▃▂▁▃▂▁▂▃ │ │ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ │ -│ │ │ │ -└─────────────────────────┘ └─────────────────────────┘ -┌─────────────────────────┐ ┌─────────────────────────┐ -│ Hőmérséklet (°C) │ │ Terhelés (Load Average) │ -│ ▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁ │ │ ▁▂▃▂▁▂▁▂▃▂▁▂▁▂▁ │ -│ │ │ │ -└─────────────────────────┘ └─────────────────────────┘ -``` - -Chart.js line charts with: -- Dark theme styling (match site theme — dark background, light grid, colored lines) -- Tooltips showing exact value + timestamp -- Y-axis: CPU 0-100%, Memory in GB (auto-scale), Temp in °C, Load auto-scale -- X-axis: time labels, format varies by range (HH:MM for 1h-24h, MM-DD for 7d-30d) -- Fill under the line with low-opacity color -- Responsive (resize with container) - -#### Section 3: Alkalmazás erőforrások (Application Resources) — Current snapshot - -Bar chart showing per-container resource usage RIGHT NOW: - -``` -╔══════════════════════════════════════════════════════════════╗ -║ Alkalmazás erőforrások ║ -╠══════════════════════════════════════════════════════════════╣ -║ ║ -║ CPU használat Memória használat ║ -║ immich-server ████████░░ 8.2% ████████░ 350 MB ║ -║ paperless-web ██░░░░░░░░ 1.5% ██████░░░ 280 MB ║ -║ immich-ml █░░░░░░░░░ 0.8% ████░░░░░ 180 MB ║ -║ romm █░░░░░░░░░ 0.3% ██░░░░░░░ 90 MB ║ -║ filebrowser ░░░░░░░░░░ 0.1% █░░░░░░░░ 25 MB ║ -║ ║ -╚══════════════════════════════════════════════════════════════╝ -``` - -Two horizontal bar charts side by side (Chart.js horizontal bar). Container names on the Y-axis. Sorted by CPU usage descending. - -Each container name is a clickable link that opens the detail view (Section 4 below). - -#### Section 4: Per-container detail (expandable / click-to-show) - -When clicking a container name, show a historical chart panel below the bar chart (or in a modal/expanded section): - -``` -╔══════════════════════════════════════════════════════════════╗ -║ immich-server — Erőforrás előzmények ║ -║ [1 óra] [6 óra] [24 óra] [7 nap] ║ -║ ║ -║ ┌─────────────────────┐ ┌─────────────────────┐ ║ -║ │ CPU % (line chart) │ │ Memória MB (line) │ ║ -║ └─────────────────────┘ └─────────────────────┘ ║ -║ ║ -╚══════════════════════════════════════════════════════════════╝ -``` - -Implementation: When a container name is clicked, JS fetches `GET /api/metrics/containers/{name}?range=24h&resolution=150` and renders two Chart.js line charts in an expandable panel below the bar charts. - -#### Section 5: Tárhely (Storage overview) - -Disk usage bars for all mounted filesystems: - -``` -╔══════════════════════════════════════════════════════════════╗ -║ Tárhely ║ -╠══════════════════════════════════════════════════════════════╣ -║ ║ -║ SSD (/) ████░░░░░░░░░░░ 17.5 GB / 460 GB (4%) ║ -║ Külső HDD (/mnt) ████████░░░░░░░ 500 GB / 1000 GB (50%) ║ -║ ║ -╚══════════════════════════════════════════════════════════════╝ -``` - -Reuse the progress bar styling from the dashboard's system info card. - -### 3F: Chart.js Integration - -**CDN import**: Include Chart.js from CDN in the monitoring page template (not site-wide — only needed on this page): - +And move the `active` class to the `1h` button: ```html - + + + ``` -Wait — the controller runs on customer hardware that may not have internet access (Cloudflare Tunnel handles external requests, but internal pages are loaded locally). The Chart.js library must be **embedded** or served locally. - -**Options**: -1. Download `chart.umd.min.js` (~200KB) and embed it via `//go:embed` in the binary, serve at `/static/chart.min.js` -2. Include it in the Docker image at build time - -**Decision**: Download chart.umd.min.js, place it at `internal/web/static/chart.min.js`, embed it alongside the templates. Serve it at `/static/chart.min.js`. - -Actually, the existing static serving only handles `style.css` and `felhom-logo.svg` via hardcoded handlers. Either: -- Add another hardcoded handler for `/static/chart.min.js` -- Or switch to a proper embedded static FS - -For simplicity, add another handler: -```go -case path == "/static/chart.min.js": - s.serveChartJS(w, r) +Same for container detail range: +```javascript +let detailRange = '1h'; ``` -### 3G: Docker Compose changes +**B. Smart default**: After the system has been running for 24+ hours, `24h` makes more sense as a default. But for v0.5.1, just use `1h` — it's always reasonable. -Add host OS release mount: -```yaml -volumes: - # ... existing ... - # Host OS info — for monitoring page system info - - /etc/os-release:/host/etc/os-release:ro +**C. Add diagnostic logging**: To understand if 24h truly returns empty, add a temporary console.log in the JS: + +```javascript +async function loadSystemMetrics() { + try { + const resp = await fetch('/api/metrics/system?range=' + systemRange + '&resolution=200'); + const json = await resp.json(); + console.log('[metrics] system range=' + systemRange + ', data points=' + (json.data?.labels?.length || 0)); + // ... rest of handler ``` -### 3H: Config additions +This helps debug if the issue is no data returned vs. data not rendering. -No new config needed. The metrics collection interval (60s) and retention (30 days) can be hardcoded for v0.5.0. Make them configurable later if needed. +### Troubleshooting commands (run on demo node) + +Before implementing the fix, verify the data is in SQLite: + +```bash +# Check how many system metric rows exist +docker exec -it felhom-controller sh -c "cat /app/data/metrics.db" | strings | head -5 +# Or directly via the API from the browser: +# https://felhom.demo-felhom.eu/api/metrics/system?range=1h&resolution=200 +# https://felhom.demo-felhom.eu/api/metrics/system?range=24h&resolution=200 +``` + +Compare the JSON responses. If 24h returns labels but cpu/memory arrays are zeros, it's a rendering issue. If labels are empty, it's a query issue. --- -## Navigation +## Bug 4: Charts empty on initial page load -### Sidebar (layout.html) +### Problem -Add fourth nav item: +When navigating to the monitoring page, all four system charts show empty (no data) until the user clicks a range button. -```html - +### Root cause + +Same as Bug 3 — the initial `loadSystemMetrics()` call uses the `24h` default range, which returns no visible data for a new system. Fixing Bug 3 (changing default to `1h`) should also fix this. + +### Additional fix — race condition protection + +Ensure the init sequence is robust. Currently: +```javascript +initSystemCharts(); +initContainerCharts(); +initDetailCharts(); +loadSysInfo(); +loadSystemMetrics(); +loadContainerSummary(); ``` -### Web route (server.go ServeHTTP) +This looks correct — charts are initialized before data is loaded. No race condition here. -```go -case path == "/monitoring": - s.monitoringHandler(w, r) -``` +### Edge case: very first load (0 data points) -### Handler (handlers.go) - -```go -func (s *Server) monitoringHandler(w http.ResponseWriter, _ *http.Request) { - data := s.baseData("monitoring", "Rendszermonitor") - data["SysInfo"] = metrics.GetStaticInfo() - data["SystemInfo"] = system.GetInfo(s.cfg.Paths.HDDPath, s.cpuCollector) - s.render(w, "monitoring", data) -} -``` - -The page itself is mostly JS-driven — static info is server-rendered, charts fetch data via API calls after page load. +If the monitoring page is loaded before the collector has stored even 1 sample (within the first 60 seconds of controller start), the "Még nincsenek adatok" message should appear. Verify this works correctly. --- -## Scheduler Integration +## Implementation order -New jobs in `main.go`: +### Step 1: Fix hostname +1. Add `/etc/hostname:/host/etc/hostname:ro` to `controller/docker-compose.yml` +2. Update `sysinfo.go` — read from `/host/etc/hostname` first -```go -// Metrics collection — every 60s -sched.Every("metrics-collect", 60*time.Second, func(ctx context.Context) error { - metricsCollector.Sample() - return nil -}) +### Step 2: Fix tooltip timestamps +1. Change `items[0].parsed.x || items[0].label` to `items[0].label` in `monitoring.html` -// Metrics pruning — daily at 04:00 -sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error { - deleted, err := metricsStore.Prune(30 * 24 * time.Hour) - if err != nil { - return err - } - logger.Printf("[INFO] Pruned %d old metric rows", deleted) - return nil -}) -``` +### Step 3: Fix default range + empty charts +1. Change `systemRange = '1h'` and `detailRange = '1h'` +2. Move `active` class to "1 óra" button in both range bars +3. Add console.log diagnostic for data loading -Note: The collector could run its own internal ticker, but using the scheduler is cleaner and consistent with the rest of the codebase. However, 60s is right at the scheduler's "quiet mode" boundary (30s). Make sure it doesn't spam logs — set quiet mode threshold to 60s or make the metrics job quiet explicitly. - -Actually, better approach: have the collector run its own internal ticker (like CPUCollector does), not via the scheduler. The scheduler is designed for heavier tasks. The collector is a lightweight background loop. Register prune-only via scheduler. - -```go -metricsCollector.Start(ctx) // starts internal 60s ticker -defer metricsCollector.Stop() - -sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error { ... }) -``` +### Step 4: Build, deploy, verify +1. Build v0.5.1 +2. Deploy to demo node (sync docker-compose.yml for new volume mount) +3. Verify hostname shows "demo-felhom" +4. Verify tooltip shows correct timestamp +5. Verify charts show data on page load +6. Test all range buttons (1h → 6h → 24h → 7d → 30d) --- -## Implementation Order - -### Step 1: SQLite metrics store -1. Add `modernc.org/sqlite` to `go.mod` -2. Create `internal/metrics/store.go` — schema, CRUD, prune, query with downsampling -3. Write basic tests if convenient - -### Step 2: Static system info -1. Create `internal/metrics/sysinfo.go` — read hostname, OS, kernel, CPU, uptime -2. Handle Docker mount of `/etc/os-release` - -### Step 3: Metrics collector -1. Create `internal/metrics/collector.go` — system + container sampling -2. Parse `docker stats --no-stream` output -3. Start collector in `main.go`, register prune job - -### Step 4: REST API endpoints -1. Add `/api/metrics/*` routes to `router.go` -2. Implement handlers: system metrics, container summary, container history, sysinfo -3. Wire `MetricsStore` into API router - -### Step 5: Chart.js embedding -1. Download `chart.umd.min.js` (v4.4.x) -2. Place in `internal/web/static/` (or alongside templates) -3. Add serving handler in `server.go` - -### Step 6: Monitoring page template + CSS -1. Create `internal/web/templates/monitoring.html` — all 5 sections -2. Add JavaScript for Chart.js rendering, time range switching, container detail expand -3. Add CSS styles to `style.css` -4. Update `layout.html` — add sidebar nav item - -### Step 7: Backup page fixes (Tasks 1 + 2) -1. Fix "Helyi mentés" in `GetFullStatus()` — synthesize from snapshot history -2. Add caching: `RefreshCache()`, modify `GetFullStatus()`, register scheduler job -3. Add initial cache goroutine in `main.go` - -### Step 8: Docker Compose changes -1. Add `/etc/os-release:/host/etc/os-release:ro` mount -2. Update `controller.yaml.example` if needed - -### Step 9: Build, deploy, verify -1. Build v0.5.0 -2. Deploy to demo node (sync full docker-compose.yml) -3. Verify backup page loads instantly -4. Verify "Helyi mentés" shows green after restart -5. Verify monitoring page renders with system info -6. Wait 5 minutes for metrics to accumulate -7. Verify charts render with data -8. Verify container bar charts show current usage -9. Click a container → verify historical chart loads -10. Test time range switching (1h/6h/24h/7d/30d) - -### Step 10: Documentation -1. Update CONTEXT.md, README -2. Bump version - ---- - -## Files to create - -``` -internal/metrics/store.go — SQLite metrics store -internal/metrics/collector.go — System + container metrics collector -internal/metrics/sysinfo.go — Static system info (OS, kernel, CPU, uptime) -internal/metrics/types.go — Shared types (SystemSample, ContainerSample, etc.) -internal/web/templates/monitoring.html — Monitoring page template -internal/web/static/chart.min.js — Chart.js library (embedded) -``` - -Note: `chart.min.js` can be placed alongside templates if using a shared `//go:embed` directive, or in a separate `static/` embed. Check existing embed patterns in `embed.go`. - ## Files to modify ``` -internal/backup/backup.go — Fix Helyi mentés synthesis + add caching -internal/api/router.go — Add /api/metrics/* endpoints -internal/web/server.go — Add /monitoring route, /static/chart.min.js handler, accept metricsStore -internal/web/handlers.go — Add monitoringHandler() -internal/web/templates/layout.html — Add sidebar nav item -internal/web/templates/style.css — Monitoring page styles (charts, tables, bars) -internal/web/embed.go — Include chart.min.js in embedded FS (if needed) -cmd/controller/main.go — Wire MetricsStore + collector, backup cache job, initial cache goroutine -controller/docker-compose.yml — Add /etc/os-release mount -go.mod — Add modernc.org/sqlite dependency -Dockerfile — May need adjustments for SQLite (verify modernc.org/sqlite works with current build) +controller/docker-compose.yml — add /etc/hostname mount +internal/metrics/sysinfo.go — read hostname from /host/etc/hostname +internal/web/templates/monitoring.html — fix tooltip callback + default range ``` --- -## Data Types Reference +## Verification checklist -```go -// internal/metrics/types.go - -type SystemSample struct { - Timestamp int64 `json:"ts"` - CPUPercent float64 `json:"cpu"` - MemUsedMB int `json:"mem_used"` - MemTotalMB int `json:"mem_total"` - TempCelsius float64 `json:"temp"` - LoadAvg1 float64 `json:"load1"` - LoadAvg5 float64 `json:"load5"` - LoadAvg15 float64 `json:"load15"` - DiskUsedGB float64 `json:"disk_used"` - DiskTotalGB float64 `json:"disk_total"` - HDDUsedGB float64 `json:"hdd_used"` - HDDTotalGB float64 `json:"hdd_total"` -} - -type ContainerSample struct { - Timestamp int64 `json:"ts"` - ContainerName string `json:"name"` - CPUPercent float64 `json:"cpu"` - MemUsageMB float64 `json:"mem_usage"` - MemLimitMB float64 `json:"mem_limit"` - NetRxBytes int64 `json:"net_rx"` - NetTxBytes int64 `json:"net_tx"` - BlockReadBytes int64 `json:"blk_read"` - BlockWriteBytes int64 `json:"blk_write"` -} - -type ContainerCurrentStats struct { - ContainerName string `json:"name"` - CPUPercent float64 `json:"cpu_percent"` - MemUsageMB float64 `json:"mem_usage_mb"` - MemLimitMB float64 `json:"mem_limit_mb"` -} - -type StaticSystemInfo struct { - Hostname string `json:"hostname"` - OS string `json:"os"` - Kernel string `json:"kernel"` - Architecture string `json:"architecture"` - CPUModel string `json:"cpu_model"` - CPUCores int `json:"cpu_cores"` - UptimeSeconds int64 `json:"uptime_seconds"` - BootTime time.Time `json:"boot_time"` -} -``` - ---- - -## Chart.js Dark Theme Configuration - -All charts should use a consistent dark theme: - -```javascript -const chartDefaults = { - backgroundColor: 'rgba(0, 136, 204, 0.1)', // accent-blue with low opacity - borderColor: '#0088cc', // accent-blue - borderWidth: 2, - pointRadius: 0, // hide dots (too many points) - pointHitRadius: 10, // but clickable for tooltips - tension: 0.3, // smooth curves - fill: true -}; - -const chartOptions = { - responsive: true, - maintainAspectRatio: false, - plugins: { - legend: { display: false }, - tooltip: { - backgroundColor: '#1c2128', - titleColor: '#e6edf3', - bodyColor: '#8b949e', - borderColor: '#30363d', - borderWidth: 1 - } - }, - scales: { - x: { - grid: { color: 'rgba(48,54,61,0.5)' }, - ticks: { color: '#8b949e', maxTicksLimit: 8 } - }, - y: { - grid: { color: 'rgba(48,54,61,0.5)' }, - ticks: { color: '#8b949e' }, - beginAtZero: true - } - } -}; -``` - -Different chart colors per metric: -- CPU: `#0088cc` (blue) -- Memory: `#238636` (green) -- Temperature: `#d29922` (yellow) -- Load: `#db6d28` (orange) -- Container CPU bars: `#0088cc` (blue) -- Container Memory bars: `#238636` (green) - ---- - -## Edge Cases - -- **No metrics yet**: First load after deploy — charts show "Még nincsenek adatok" (No data yet) message -- **Container appears/disappears**: Containers may start/stop between samples. The summary shows only currently running containers. Historical charts handle gaps gracefully (Chart.js `spanGaps: true`) -- **Pi 3B+ with 1G RAM**: SQLite + metrics should be lightweight. 60s interval = 1440 system rows/day + ~10k container rows/day (10 containers). With 30-day retention ≈ 350k rows max. SQLite handles this easily. -- **Docker socket permissions**: `docker stats` already works because the socket is mounted. No new permissions needed. -- **Timezone**: All timestamps stored as UTC Unix epoch. Chart.js tooltip formatter converts to Europe/Budapest for display. -- **Chart.js file size**: ~200KB minified. Acceptable for a single page load. Only loaded on the monitoring page. - ---- - -## Verification Checklist - -### Backup fixes -- [ ] "Helyi mentés" shows green ✓ after controller restart (not "–") -- [ ] Backup page loads in <500ms (cached, no subprocess calls) -- [ ] Cache refreshes after manual backup ("Mentés most") -- [ ] Backup page still shows correct data after cache refresh - -### Monitoring page -- [ ] Sidebar shows 4 nav items, "Rendszermonitor" is highlighted when active -- [ ] System overview card shows hostname, OS, kernel, CPU model, cores, uptime -- [ ] System metrics charts render (CPU, Memory, Temperature, Load) -- [ ] Time range buttons work (1h/6h/24h/7d/30d) — charts update -- [ ] Container resource bar charts show current per-container CPU + memory -- [ ] Clicking a container name shows historical charts -- [ ] Storage section shows SSD + HDD usage bars -- [ ] Charts use dark theme matching site design -- [ ] Page works on mobile (responsive charts) -- [ ] Metrics accumulate over time (check after 10+ minutes) -- [ ] SQLite DB created in data volume (survives restart) -- [ ] Metrics prune job runs daily (check scheduler logs) -- [ ] No regressions on dashboard, apps, backup pages \ No newline at end of file +- [ ] Hostname shows "demo-felhom" (not container ID) +- [ ] Tooltip shows correct timestamp (e.g., "2026. 02. 16. 10:21") +- [ ] Charts show data on initial page load (1h default) +- [ ] "1 óra" button is active/highlighted by default +- [ ] Clicking each range button updates charts +- [ ] "24 óra" shows data if there are 1+ hours of collected metrics +- [ ] Container bar charts still render correctly +- [ ] Container detail panel still works +- [ ] No console errors in browser devtools \ No newline at end of file diff --git a/controller/controller b/controller/controller new file mode 100644 index 0000000..69475ba Binary files /dev/null and b/controller/controller differ