918 lines
35 KiB
Markdown
918 lines
35 KiB
Markdown
# TASK.md — v0.5.0: Backup Bugfixes + Monitoring Page with Metrics Store
|
||
|
||
> Version bump: **v0.5.0**
|
||
> Scope: 2 backup fixes + new metrics subsystem + new monitoring page
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
1. **Bugfix**: "Helyi mentés" shows "–" after controller restart (in-memory `LastBackup` lost)
|
||
2. **Performance**: Backup page caching — `GetFullStatus()` calls restic/docker on every page load (3-4s)
|
||
3. **Feature**: New **Rendszermonitor** (System Monitor) page with historical metrics stored in SQLite, rendered with Chart.js
|
||
|
||
---
|
||
|
||
## Task 1: Fix "Helyi mentés" status after restart
|
||
|
||
### Problem
|
||
|
||
After controller restart, `LastBackup` is nil (in-memory only). The template checks `{{if .Backup.LastBackup}}` and falls through to the "–" state. However, `snapshotHistory` IS loaded on startup from `LoadSnapshotHistory()` — so we know backups exist.
|
||
|
||
### Fix
|
||
|
||
In `GetFullStatus()`, after copying snapshot history, synthesize a `LastBackup` if it's nil but snapshots exist:
|
||
|
||
```go
|
||
// After m.mu.Unlock() and snapshot reversal...
|
||
|
||
// Synthesize LastBackup from snapshot history if not in memory (e.g., after restart)
|
||
if status.LastBackup == nil && len(status.SnapshotHistory) > 0 {
|
||
latest := status.SnapshotHistory[0] // already reversed, newest first
|
||
status.LastBackup = &BackupStatus{
|
||
LastRun: latest.Time,
|
||
Success: latest.Success,
|
||
Snapshot: &SnapshotResult{
|
||
SnapshotID: latest.SnapshotID,
|
||
},
|
||
}
|
||
}
|
||
```
|
||
|
||
Also do the same for `LastDBDump` — synthesize from `DumpFiles` on disk:
|
||
|
||
```go
|
||
if status.LastDBDump == nil && len(status.DumpFiles) > 0 {
|
||
var results []DumpResult
|
||
var latestTime time.Time
|
||
for _, f := range status.DumpFiles {
|
||
results = append(results, DumpResult{
|
||
DB: DiscoveredDB{StackName: f.StackName, DBType: f.DBType, ContainerName: f.StackName},
|
||
FilePath: f.FileName,
|
||
Size: f.Size,
|
||
})
|
||
if f.ModTime.After(latestTime) {
|
||
latestTime = f.ModTime
|
||
}
|
||
}
|
||
status.LastDBDump = &DBDumpStatus{
|
||
LastRun: latestTime,
|
||
Results: results,
|
||
Success: true,
|
||
}
|
||
}
|
||
```
|
||
|
||
### Verification
|
||
|
||
Restart the controller (`docker compose restart`), then load the backup page. "Helyi mentés" should show green checkmark "aktív" (not "–").
|
||
|
||
---
|
||
|
||
## Task 2: Backup page caching
|
||
|
||
### Problem
|
||
|
||
`GetFullStatus()` runs `restic stats --json`, `restic snapshots --json`, `docker ps`, and `docker inspect` synchronously on **every page load** of `/backups`. Takes 3-4 seconds.
|
||
|
||
### Fix: Background cache with periodic refresh
|
||
<!!! Should be already implemented, verify only !!!>
|
||
|
||
Add cached status to Manager:
|
||
|
||
```go
|
||
type Manager struct {
|
||
// ... existing fields ...
|
||
cachedStatus *FullBackupStatus
|
||
cacheTime time.Time
|
||
}
|
||
```
|
||
|
||
New method:
|
||
|
||
```go
|
||
// RefreshCache updates the cached backup status in the background.
|
||
// Called by scheduler every 5 minutes and after each backup run.
|
||
func (m *Manager) RefreshCache(nextDBDump, nextBackup time.Time) {
|
||
// Execute all the expensive calls (restic stats, docker ps, etc.)
|
||
// Same logic currently in GetFullStatus()...
|
||
status := &FullBackupStatus{ ... }
|
||
|
||
m.mu.Lock()
|
||
m.cachedStatus = status
|
||
m.cacheTime = time.Now()
|
||
m.mu.Unlock()
|
||
}
|
||
```
|
||
|
||
Modified `GetFullStatus()` — reads from cache, updates only cheap dynamic fields:
|
||
|
||
```go
|
||
func (m *Manager) GetFullStatus(nextDBDump, nextBackup time.Time) *FullBackupStatus {
|
||
m.mu.Lock()
|
||
defer m.mu.Unlock()
|
||
|
||
if m.cachedStatus != nil {
|
||
// Update dynamic fields without subprocess calls
|
||
m.cachedStatus.Running = m.running
|
||
m.cachedStatus.NextDBDump = nextDBDump
|
||
m.cachedStatus.NextBackup = nextBackup
|
||
m.cachedStatus.LastDBDump = m.lastDBDump
|
||
m.cachedStatus.LastBackup = m.lastBackup
|
||
m.cachedStatus.SnapshotHistory = make([]SnapshotRecord, len(m.snapshotHistory))
|
||
copy(m.cachedStatus.SnapshotHistory, m.snapshotHistory)
|
||
// Apply snapshot reversal + LastBackup synthesis (from Task 1)
|
||
return m.cachedStatus
|
||
}
|
||
|
||
// No cache yet — return minimal status (before first refresh completes)
|
||
return &FullBackupStatus{
|
||
Enabled: m.cfg.Backup.Enabled,
|
||
Running: m.running,
|
||
LastDBDump: m.lastDBDump,
|
||
LastBackup: m.lastBackup,
|
||
DBDumpSchedule: m.cfg.Backup.DBDumpSchedule,
|
||
ResticSchedule: m.cfg.Backup.ResticSchedule,
|
||
PruneSchedule: m.cfg.Backup.PruneSchedule,
|
||
NextDBDump: nextDBDump,
|
||
NextBackup: nextBackup,
|
||
Retention: m.cfg.Backup.Retention,
|
||
RepoPath: m.cfg.Backup.ResticRepo,
|
||
BackupPaths: []string{m.cfg.Paths.StacksDir, m.cfg.Paths.DBDumpDir, "/opt/docker/felhom-controller/controller.yaml"},
|
||
}
|
||
}
|
||
```
|
||
|
||
### Scheduler + lifecycle integration
|
||
|
||
In `main.go`:
|
||
|
||
```go
|
||
// Register cache refresh job (every 5 min)
|
||
sched.Every("backup-cache", 5*time.Minute, func(ctx context.Context) error {
|
||
nextDBDump := scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule)
|
||
nextBackup := scheduler.NextDailyRun(cfg.Backup.ResticSchedule)
|
||
backupMgr.RefreshCache(nextDBDump, nextBackup)
|
||
return nil
|
||
})
|
||
|
||
// Initial cache population (non-blocking)
|
||
go func() {
|
||
backupMgr.RefreshCache(
|
||
scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule),
|
||
scheduler.NextDailyRun(cfg.Backup.ResticSchedule),
|
||
)
|
||
}()
|
||
```
|
||
|
||
At the end of `RunFullBackup()` and `RunBackup()`, call `RefreshCache()` so the page shows updated data immediately.
|
||
|
||
**Import note**: `RefreshCache` takes `nextDBDump, nextBackup time.Time` as params to avoid circular import with scheduler package.
|
||
|
||
### Add `CacheTime` to FullBackupStatus
|
||
|
||
```go
|
||
type FullBackupStatus struct {
|
||
// ... existing ...
|
||
CacheTime time.Time // when cache was last refreshed
|
||
}
|
||
```
|
||
|
||
Template can optionally show staleness at the bottom of the page in muted text.
|
||
|
||
---
|
||
|
||
## Task 3: Monitoring Page — Metrics Store + Charts
|
||
|
||
### Architecture
|
||
|
||
```
|
||
┌──────────────────────────┐
|
||
│ felhom-controller │
|
||
│ │
|
||
/proc/stat ────────────►│ MetricsCollector │
|
||
/proc/meminfo ─────────►│ (every 60s) │
|
||
/sys/thermal ──────────►│ ↓ │
|
||
docker stats ──────────►│ SQLite DB │
|
||
│ (/data/metrics.db) │
|
||
│ ↓ │
|
||
│ REST API │
|
||
│ /api/metrics/* │
|
||
│ ↓ │
|
||
│ Chart.js in browser │
|
||
└──────────────────────────┘
|
||
```
|
||
|
||
No new containers. Everything runs inside the existing controller.
|
||
|
||
### 3A: SQLite Metrics Store (`internal/metrics/store.go`)
|
||
|
||
**Database**: `/opt/docker/felhom-controller/data/metrics.db` (inside the controller-data volume — persists across restarts)
|
||
|
||
**Tables**:
|
||
|
||
```sql
|
||
-- System-wide metrics (1 row per sample)
|
||
CREATE TABLE IF NOT EXISTS system_metrics (
|
||
ts INTEGER NOT NULL, -- Unix timestamp
|
||
cpu_percent REAL NOT NULL,
|
||
mem_used_mb INTEGER NOT NULL,
|
||
mem_total_mb INTEGER NOT NULL,
|
||
temp_celsius REAL,
|
||
load_avg_1 REAL,
|
||
load_avg_5 REAL,
|
||
load_avg_15 REAL,
|
||
disk_used_gb REAL,
|
||
disk_total_gb REAL,
|
||
hdd_used_gb REAL,
|
||
hdd_total_gb REAL
|
||
);
|
||
CREATE INDEX IF NOT EXISTS idx_system_ts ON system_metrics(ts);
|
||
|
||
-- Per-container metrics (1 row per container per sample)
|
||
CREATE TABLE IF NOT EXISTS container_metrics (
|
||
ts INTEGER NOT NULL, -- Unix timestamp
|
||
container_name TEXT NOT NULL,
|
||
cpu_percent REAL NOT NULL,
|
||
mem_usage_mb REAL NOT NULL,
|
||
mem_limit_mb REAL,
|
||
net_rx_bytes INTEGER,
|
||
net_tx_bytes INTEGER,
|
||
block_read_bytes INTEGER,
|
||
block_write_bytes INTEGER
|
||
);
|
||
CREATE INDEX IF NOT EXISTS idx_container_ts ON container_metrics(ts);
|
||
CREATE INDEX IF NOT EXISTS idx_container_name ON container_metrics(container_name, ts);
|
||
```
|
||
|
||
**Go struct**:
|
||
|
||
```go
|
||
type MetricsStore struct {
|
||
db *sql.DB
|
||
logger *log.Logger
|
||
}
|
||
|
||
func NewMetricsStore(dbPath string, logger *log.Logger) (*MetricsStore, error)
|
||
func (s *MetricsStore) Close() error
|
||
func (s *MetricsStore) InsertSystemMetrics(m SystemSample) error
|
||
func (s *MetricsStore) InsertContainerMetrics(samples []ContainerSample) error
|
||
func (s *MetricsStore) QuerySystemMetrics(from, to time.Time, resolution int) ([]SystemSample, error)
|
||
func (s *MetricsStore) QueryContainerMetrics(name string, from, to time.Time, resolution int) ([]ContainerSample, error)
|
||
func (s *MetricsStore) QueryContainerSummary() ([]ContainerCurrentStats, error)
|
||
func (s *MetricsStore) Prune(olderThan time.Duration) (int64, error)
|
||
```
|
||
|
||
**Resolution/downsampling**: The `resolution` parameter controls how many data points to return. For example, resolution=100 means "return ~100 points, averaging intermediate samples". Implementation: compute bucket size as `(to-from)/resolution`, group by `ts/bucketSeconds`, average the values.
|
||
|
||
**Auto-prune**: Delete rows older than 30 days. Run daily via scheduler.
|
||
|
||
**SQLite pragmas on open**:
|
||
```sql
|
||
PRAGMA journal_mode=WAL;
|
||
PRAGMA synchronous=NORMAL;
|
||
PRAGMA busy_timeout=5000;
|
||
```
|
||
|
||
WAL mode is critical — allows concurrent reads (page loads) while writer (collector) inserts. Without WAL, page loads would block during writes.
|
||
|
||
**Dependencies**: Use `modernc.org/sqlite` (pure Go, no CGO — works in the existing Alpine-based Docker image without extra libs). Add to `go.mod`:
|
||
```
|
||
go get modernc.org/sqlite
|
||
```
|
||
|
||
If `modernc.org/sqlite` is problematic (large binary size increase), alternative: use `github.com/mattn/go-sqlite3` but this requires CGO and the Dockerfile would need `gcc` + `musl-dev` in the build stage. Prefer `modernc.org/sqlite` for simplicity.
|
||
|
||
### 3B: Metrics Collector (`internal/metrics/collector.go`)
|
||
|
||
A single goroutine that samples both system and container metrics every 60 seconds:
|
||
|
||
```go
|
||
type MetricsCollector struct {
|
||
store *MetricsStore
|
||
cpuCollector *system.CPUCollector
|
||
hddPath string
|
||
logger *log.Logger
|
||
cancel context.CancelFunc
|
||
}
|
||
|
||
func NewMetricsCollector(store *MetricsStore, cpuCollector *system.CPUCollector, hddPath string, logger *log.Logger) *MetricsCollector
|
||
func (c *MetricsCollector) Start(ctx context.Context)
|
||
func (c *MetricsCollector) Stop()
|
||
```
|
||
|
||
**System sampling** (reuse existing functions):
|
||
```go
|
||
func (c *MetricsCollector) sampleSystem() SystemSample {
|
||
info := system.GetInfo(c.hddPath, c.cpuCollector)
|
||
return SystemSample{
|
||
Timestamp: time.Now().Unix(),
|
||
CPUPercent: info.CPUPercent,
|
||
MemUsedMB: int(info.UsedMemMB),
|
||
MemTotalMB: int(info.TotalMemMB),
|
||
TempCelsius: info.TemperatureCelsius,
|
||
LoadAvg1: info.LoadAvg1,
|
||
LoadAvg5: info.LoadAvg5,
|
||
LoadAvg15: info.LoadAvg15,
|
||
DiskUsedGB: info.DiskUsedGB,
|
||
DiskTotalGB: info.DiskTotalGB,
|
||
HDDUsedGB: info.HDDUsedGB,
|
||
HDDTotalGB: info.HDDTotalGB,
|
||
}
|
||
}
|
||
```
|
||
|
||
**Container sampling** via `docker stats --no-stream`:
|
||
|
||
```go
|
||
func (c *MetricsCollector) sampleContainers() []ContainerSample {
|
||
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||
defer cancel()
|
||
|
||
cmd := exec.CommandContext(ctx, "docker", "stats", "--no-stream",
|
||
"--format", "{{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}")
|
||
out, err := cmd.Output()
|
||
// Parse each line...
|
||
}
|
||
```
|
||
|
||
Parse the output fields. `docker stats` returns values like:
|
||
- CPU%: "2.50%" → parse as float 2.50
|
||
- MemUsage: "150.5MiB / 512MiB" → parse numerator as MB
|
||
- NetIO: "1.5MB / 2.3MB" → parse rx/tx bytes
|
||
- BlockIO: "50MB / 10MB" → parse read/write bytes
|
||
|
||
**Filter out**: Skip infrastructure containers if desired (felhom-controller itself — avoid self-referential metrics noise). Actually, include all containers including infra — useful for debugging.
|
||
|
||
**Collector loop**:
|
||
```go
|
||
func (c *MetricsCollector) loop(ctx context.Context) {
|
||
ticker := time.NewTicker(60 * time.Second)
|
||
defer ticker.Stop()
|
||
|
||
// Sample immediately on start
|
||
c.sample()
|
||
|
||
for {
|
||
select {
|
||
case <-ctx.Done():
|
||
return
|
||
case <-ticker.C:
|
||
c.sample()
|
||
}
|
||
}
|
||
}
|
||
|
||
func (c *MetricsCollector) sample() {
|
||
sys := c.sampleSystem()
|
||
if err := c.store.InsertSystemMetrics(sys); err != nil {
|
||
c.logger.Printf("[WARN] Failed to store system metrics: %v", err)
|
||
}
|
||
|
||
containers := c.sampleContainers()
|
||
if err := c.store.InsertContainerMetrics(containers); err != nil {
|
||
c.logger.Printf("[WARN] Failed to store container metrics: %v", err)
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3C: System Info Provider (`internal/metrics/sysinfo.go`)
|
||
|
||
Static system information (doesn't change between samples):
|
||
|
||
```go
|
||
type StaticSystemInfo struct {
|
||
Hostname string
|
||
OS string // e.g., "Debian GNU/Linux 12 (bookworm)"
|
||
Kernel string // e.g., "6.1.0-18-amd64"
|
||
Architecture string // e.g., "x86_64"
|
||
CPUModel string // e.g., "Intel N100"
|
||
CPUCores int // e.g., 4
|
||
Uptime time.Duration
|
||
UptimeSince time.Time
|
||
}
|
||
|
||
func GetStaticInfo() StaticSystemInfo
|
||
```
|
||
|
||
Read from:
|
||
- Hostname: `os.Hostname()` or `/etc/hostname`
|
||
- OS: `/etc/os-release` → `PRETTY_NAME`
|
||
- Kernel: `/proc/version` or `uname -r` (read `/proc/sys/kernel/osrelease`)
|
||
- Architecture: `runtime.GOARCH` (but for display, read `/proc/cpuinfo` or use `uname -m`)
|
||
- CPU model: `/proc/cpuinfo` → `model name` field
|
||
- CPU cores: `/proc/cpuinfo` → count `processor` lines, or `runtime.NumCPU()`
|
||
- Uptime: `/proc/uptime` → first field (seconds since boot)
|
||
|
||
**IMPORTANT**: These are read from *inside the container*. `/proc/uptime`, `/proc/cpuinfo`, and `/proc/version` reflect the **host** (not the container) because they're from the host kernel. `/etc/os-release` inside the container shows the container's OS (Debian), not the host's. For displaying host OS:
|
||
- Mount `/etc/os-release` read-only from host: add to docker-compose.yml
|
||
- Or: Read from `/host/etc/os-release` with a `/etc:/host/etc:ro` mount
|
||
|
||
**Decision**: Add `/etc/os-release:/host/etc/os-release:ro` to docker-compose.yml. Read host OS from `/host/etc/os-release`, fall back to container's `/etc/os-release`.
|
||
|
||
### 3D: REST API Endpoints (`internal/api/router.go`)
|
||
|
||
New endpoints:
|
||
|
||
```
|
||
GET /api/metrics/system?range=24h&resolution=200
|
||
GET /api/metrics/system?from=2026-02-15T00:00:00Z&to=2026-02-16T00:00:00Z&resolution=200
|
||
GET /api/metrics/containers/summary
|
||
GET /api/metrics/containers/{name}?range=7d&resolution=200
|
||
GET /api/metrics/sysinfo
|
||
```
|
||
|
||
**Range presets**: `1h`, `6h`, `24h`, `7d`, `30d` — parsed in the handler and converted to `from`/`to` timestamps.
|
||
|
||
**Response format** for system metrics:
|
||
```json
|
||
{
|
||
"ok": true,
|
||
"data": {
|
||
"labels": [1708041600, 1708041660, ...],
|
||
"cpu": [5.2, 4.8, 6.1, ...],
|
||
"memory": [3200, 3250, 3180, ...],
|
||
"temp": [42, 41, 43, ...],
|
||
"load1": [0.3, 0.2, 0.4, ...]
|
||
}
|
||
}
|
||
```
|
||
|
||
Flat arrays are more efficient for Chart.js than arrays of objects.
|
||
|
||
**Response format** for container summary:
|
||
```json
|
||
{
|
||
"ok": true,
|
||
"data": [
|
||
{"name": "immich-server", "cpu_percent": 2.5, "mem_usage_mb": 350, "mem_limit_mb": 2048},
|
||
{"name": "paperless-webserver", "cpu_percent": 0.1, "mem_usage_mb": 280, "mem_limit_mb": 1024}
|
||
]
|
||
}
|
||
```
|
||
|
||
### 3E: Monitoring Page Template (`internal/web/templates/monitoring.html`)
|
||
|
||
Four sections:
|
||
|
||
#### Section 1: Rendszer áttekintés (System Overview)
|
||
|
||
Static system info card:
|
||
|
||
```
|
||
╔══════════════════════════════════════════════════════╗
|
||
║ Rendszer áttekintés ║
|
||
╠══════════════════════════════════════════════════════╣
|
||
║ ║
|
||
║ Gépnév: demo-felhom ║
|
||
║ Operációs rendszer: Debian GNU/Linux 12 (bookworm) ║
|
||
║ Kernel: 6.1.0-18-amd64 ║
|
||
║ Processzor: Intel N100 (4 mag) ║
|
||
║ Üzemidő: 15 nap, 3 óra ║
|
||
║ Indítás: 2026-02-01 05:12 ║
|
||
║ ║
|
||
╚══════════════════════════════════════════════════════╝
|
||
```
|
||
|
||
Hungarian labels:
|
||
- "Rendszer áttekintés" = System overview
|
||
- "Gépnév" = Hostname
|
||
- "Operációs rendszer" = Operating system
|
||
- "Kernel" = Kernel
|
||
- "Processzor" = Processor
|
||
- "mag" = cores
|
||
- "Üzemidő" = Uptime
|
||
- "nap" = days, "óra" = hours
|
||
- "Indítás" = Started at
|
||
|
||
#### Section 2: Rendszer metrikák (System Metrics) — Charts
|
||
|
||
Time range selector (pill buttons like filter bar):
|
||
|
||
```
|
||
[ 1 óra ] [ 6 óra ] [ 24 óra ] [ 7 nap ] [ 30 nap ]
|
||
```
|
||
|
||
Four charts in a 2×2 grid:
|
||
|
||
```
|
||
┌─────────────────────────┐ ┌─────────────────────────┐
|
||
│ CPU használat (%) │ │ Memória használat (GB) │
|
||
│ ▁▂▃▂▁▂▄▃▂▁▃▂▁▂▃ │ │ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ │
|
||
│ │ │ │
|
||
└─────────────────────────┘ └─────────────────────────┘
|
||
┌─────────────────────────┐ ┌─────────────────────────┐
|
||
│ Hőmérséklet (°C) │ │ Terhelés (Load Average) │
|
||
│ ▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁ │ │ ▁▂▃▂▁▂▁▂▃▂▁▂▁▂▁ │
|
||
│ │ │ │
|
||
└─────────────────────────┘ └─────────────────────────┘
|
||
```
|
||
|
||
Chart.js line charts with:
|
||
- Dark theme styling (match site theme — dark background, light grid, colored lines)
|
||
- Tooltips showing exact value + timestamp
|
||
- Y-axis: CPU 0-100%, Memory in GB (auto-scale), Temp in °C, Load auto-scale
|
||
- X-axis: time labels, format varies by range (HH:MM for 1h-24h, MM-DD for 7d-30d)
|
||
- Fill under the line with low-opacity color
|
||
- Responsive (resize with container)
|
||
|
||
#### Section 3: Alkalmazás erőforrások (Application Resources) — Current snapshot
|
||
|
||
Bar chart showing per-container resource usage RIGHT NOW:
|
||
|
||
```
|
||
╔══════════════════════════════════════════════════════════════╗
|
||
║ Alkalmazás erőforrások ║
|
||
╠══════════════════════════════════════════════════════════════╣
|
||
║ ║
|
||
║ CPU használat Memória használat ║
|
||
║ immich-server ████████░░ 8.2% ████████░ 350 MB ║
|
||
║ paperless-web ██░░░░░░░░ 1.5% ██████░░░ 280 MB ║
|
||
║ immich-ml █░░░░░░░░░ 0.8% ████░░░░░ 180 MB ║
|
||
║ romm █░░░░░░░░░ 0.3% ██░░░░░░░ 90 MB ║
|
||
║ filebrowser ░░░░░░░░░░ 0.1% █░░░░░░░░ 25 MB ║
|
||
║ ║
|
||
╚══════════════════════════════════════════════════════════════╝
|
||
```
|
||
|
||
Two horizontal bar charts side by side (Chart.js horizontal bar). Container names on the Y-axis. Sorted by CPU usage descending.
|
||
|
||
Each container name is a clickable link that opens the detail view (Section 4 below).
|
||
|
||
#### Section 4: Per-container detail (expandable / click-to-show)
|
||
|
||
When clicking a container name, show a historical chart panel below the bar chart (or in a modal/expanded section):
|
||
|
||
```
|
||
╔══════════════════════════════════════════════════════════════╗
|
||
║ immich-server — Erőforrás előzmények ║
|
||
║ [1 óra] [6 óra] [24 óra] [7 nap] ║
|
||
║ ║
|
||
║ ┌─────────────────────┐ ┌─────────────────────┐ ║
|
||
║ │ CPU % (line chart) │ │ Memória MB (line) │ ║
|
||
║ └─────────────────────┘ └─────────────────────┘ ║
|
||
║ ║
|
||
╚══════════════════════════════════════════════════════════════╝
|
||
```
|
||
|
||
Implementation: When a container name is clicked, JS fetches `GET /api/metrics/containers/{name}?range=24h&resolution=150` and renders two Chart.js line charts in an expandable panel below the bar charts.
|
||
|
||
#### Section 5: Tárhely (Storage overview)
|
||
|
||
Disk usage bars for all mounted filesystems:
|
||
|
||
```
|
||
╔══════════════════════════════════════════════════════════════╗
|
||
║ Tárhely ║
|
||
╠══════════════════════════════════════════════════════════════╣
|
||
║ ║
|
||
║ SSD (/) ████░░░░░░░░░░░ 17.5 GB / 460 GB (4%) ║
|
||
║ Külső HDD (/mnt) ████████░░░░░░░ 500 GB / 1000 GB (50%) ║
|
||
║ ║
|
||
╚══════════════════════════════════════════════════════════════╝
|
||
```
|
||
|
||
Reuse the progress bar styling from the dashboard's system info card.
|
||
|
||
### 3F: Chart.js Integration
|
||
|
||
**CDN import**: Include Chart.js from CDN in the monitoring page template (not site-wide — only needed on this page):
|
||
|
||
```html
|
||
<script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.7/dist/chart.umd.min.js"></script>
|
||
```
|
||
|
||
Wait — the controller runs on customer hardware that may not have internet access (Cloudflare Tunnel handles external requests, but internal pages are loaded locally). The Chart.js library must be **embedded** or served locally.
|
||
|
||
**Options**:
|
||
1. Download `chart.umd.min.js` (~200KB) and embed it via `//go:embed` in the binary, serve at `/static/chart.min.js`
|
||
2. Include it in the Docker image at build time
|
||
|
||
**Decision**: Download chart.umd.min.js, place it at `internal/web/static/chart.min.js`, embed it alongside the templates. Serve it at `/static/chart.min.js`.
|
||
|
||
Actually, the existing static serving only handles `style.css` and `felhom-logo.svg` via hardcoded handlers. Either:
|
||
- Add another hardcoded handler for `/static/chart.min.js`
|
||
- Or switch to a proper embedded static FS
|
||
|
||
For simplicity, add another handler:
|
||
```go
|
||
case path == "/static/chart.min.js":
|
||
s.serveChartJS(w, r)
|
||
```
|
||
|
||
### 3G: Docker Compose changes
|
||
|
||
Add host OS release mount:
|
||
```yaml
|
||
volumes:
|
||
# ... existing ...
|
||
# Host OS info — for monitoring page system info
|
||
- /etc/os-release:/host/etc/os-release:ro
|
||
```
|
||
|
||
### 3H: Config additions
|
||
|
||
No new config needed. The metrics collection interval (60s) and retention (30 days) can be hardcoded for v0.5.0. Make them configurable later if needed.
|
||
|
||
---
|
||
|
||
## Navigation
|
||
|
||
### Sidebar (layout.html)
|
||
|
||
Add fourth nav item:
|
||
|
||
```html
|
||
<ul class="nav-links">
|
||
<li><a href="/" class="...">Vezérlőpult</a></li>
|
||
<li><a href="/stacks" class="...">Alkalmazások</a></li>
|
||
<li><a href="/backups" class="...">Biztonsági mentés</a></li>
|
||
<li><a href="/monitoring" class="...">Rendszermonitor</a></li>
|
||
</ul>
|
||
```
|
||
|
||
### Web route (server.go ServeHTTP)
|
||
|
||
```go
|
||
case path == "/monitoring":
|
||
s.monitoringHandler(w, r)
|
||
```
|
||
|
||
### Handler (handlers.go)
|
||
|
||
```go
|
||
func (s *Server) monitoringHandler(w http.ResponseWriter, _ *http.Request) {
|
||
data := s.baseData("monitoring", "Rendszermonitor")
|
||
data["SysInfo"] = metrics.GetStaticInfo()
|
||
data["SystemInfo"] = system.GetInfo(s.cfg.Paths.HDDPath, s.cpuCollector)
|
||
s.render(w, "monitoring", data)
|
||
}
|
||
```
|
||
|
||
The page itself is mostly JS-driven — static info is server-rendered, charts fetch data via API calls after page load.
|
||
|
||
---
|
||
|
||
## Scheduler Integration
|
||
|
||
New jobs in `main.go`:
|
||
|
||
```go
|
||
// Metrics collection — every 60s
|
||
sched.Every("metrics-collect", 60*time.Second, func(ctx context.Context) error {
|
||
metricsCollector.Sample()
|
||
return nil
|
||
})
|
||
|
||
// Metrics pruning — daily at 04:00
|
||
sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error {
|
||
deleted, err := metricsStore.Prune(30 * 24 * time.Hour)
|
||
if err != nil {
|
||
return err
|
||
}
|
||
logger.Printf("[INFO] Pruned %d old metric rows", deleted)
|
||
return nil
|
||
})
|
||
```
|
||
|
||
Note: The collector could run its own internal ticker, but using the scheduler is cleaner and consistent with the rest of the codebase. However, 60s is right at the scheduler's "quiet mode" boundary (30s). Make sure it doesn't spam logs — set quiet mode threshold to 60s or make the metrics job quiet explicitly.
|
||
|
||
Actually, better approach: have the collector run its own internal ticker (like CPUCollector does), not via the scheduler. The scheduler is designed for heavier tasks. The collector is a lightweight background loop. Register prune-only via scheduler.
|
||
|
||
```go
|
||
metricsCollector.Start(ctx) // starts internal 60s ticker
|
||
defer metricsCollector.Stop()
|
||
|
||
sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error { ... })
|
||
```
|
||
|
||
---
|
||
|
||
## Implementation Order
|
||
|
||
### Step 1: SQLite metrics store
|
||
1. Add `modernc.org/sqlite` to `go.mod`
|
||
2. Create `internal/metrics/store.go` — schema, CRUD, prune, query with downsampling
|
||
3. Write basic tests if convenient
|
||
|
||
### Step 2: Static system info
|
||
1. Create `internal/metrics/sysinfo.go` — read hostname, OS, kernel, CPU, uptime
|
||
2. Handle Docker mount of `/etc/os-release`
|
||
|
||
### Step 3: Metrics collector
|
||
1. Create `internal/metrics/collector.go` — system + container sampling
|
||
2. Parse `docker stats --no-stream` output
|
||
3. Start collector in `main.go`, register prune job
|
||
|
||
### Step 4: REST API endpoints
|
||
1. Add `/api/metrics/*` routes to `router.go`
|
||
2. Implement handlers: system metrics, container summary, container history, sysinfo
|
||
3. Wire `MetricsStore` into API router
|
||
|
||
### Step 5: Chart.js embedding
|
||
1. Download `chart.umd.min.js` (v4.4.x)
|
||
2. Place in `internal/web/static/` (or alongside templates)
|
||
3. Add serving handler in `server.go`
|
||
|
||
### Step 6: Monitoring page template + CSS
|
||
1. Create `internal/web/templates/monitoring.html` — all 5 sections
|
||
2. Add JavaScript for Chart.js rendering, time range switching, container detail expand
|
||
3. Add CSS styles to `style.css`
|
||
4. Update `layout.html` — add sidebar nav item
|
||
|
||
### Step 7: Backup page fixes (Tasks 1 + 2)
|
||
1. Fix "Helyi mentés" in `GetFullStatus()` — synthesize from snapshot history
|
||
2. Add caching: `RefreshCache()`, modify `GetFullStatus()`, register scheduler job
|
||
3. Add initial cache goroutine in `main.go`
|
||
|
||
### Step 8: Docker Compose changes
|
||
1. Add `/etc/os-release:/host/etc/os-release:ro` mount
|
||
2. Update `controller.yaml.example` if needed
|
||
|
||
### Step 9: Build, deploy, verify
|
||
1. Build v0.5.0
|
||
2. Deploy to demo node (sync full docker-compose.yml)
|
||
3. Verify backup page loads instantly
|
||
4. Verify "Helyi mentés" shows green after restart
|
||
5. Verify monitoring page renders with system info
|
||
6. Wait 5 minutes for metrics to accumulate
|
||
7. Verify charts render with data
|
||
8. Verify container bar charts show current usage
|
||
9. Click a container → verify historical chart loads
|
||
10. Test time range switching (1h/6h/24h/7d/30d)
|
||
|
||
### Step 10: Documentation
|
||
1. Update CONTEXT.md, README
|
||
2. Bump version
|
||
|
||
---
|
||
|
||
## Files to create
|
||
|
||
```
|
||
internal/metrics/store.go — SQLite metrics store
|
||
internal/metrics/collector.go — System + container metrics collector
|
||
internal/metrics/sysinfo.go — Static system info (OS, kernel, CPU, uptime)
|
||
internal/metrics/types.go — Shared types (SystemSample, ContainerSample, etc.)
|
||
internal/web/templates/monitoring.html — Monitoring page template
|
||
internal/web/static/chart.min.js — Chart.js library (embedded)
|
||
```
|
||
|
||
Note: `chart.min.js` can be placed alongside templates if using a shared `//go:embed` directive, or in a separate `static/` embed. Check existing embed patterns in `embed.go`.
|
||
|
||
## Files to modify
|
||
|
||
```
|
||
internal/backup/backup.go — Fix Helyi mentés synthesis + add caching
|
||
internal/api/router.go — Add /api/metrics/* endpoints
|
||
internal/web/server.go — Add /monitoring route, /static/chart.min.js handler, accept metricsStore
|
||
internal/web/handlers.go — Add monitoringHandler()
|
||
internal/web/templates/layout.html — Add sidebar nav item
|
||
internal/web/templates/style.css — Monitoring page styles (charts, tables, bars)
|
||
internal/web/embed.go — Include chart.min.js in embedded FS (if needed)
|
||
cmd/controller/main.go — Wire MetricsStore + collector, backup cache job, initial cache goroutine
|
||
controller/docker-compose.yml — Add /etc/os-release mount
|
||
go.mod — Add modernc.org/sqlite dependency
|
||
Dockerfile — May need adjustments for SQLite (verify modernc.org/sqlite works with current build)
|
||
```
|
||
|
||
---
|
||
|
||
## Data Types Reference
|
||
|
||
```go
|
||
// internal/metrics/types.go
|
||
|
||
type SystemSample struct {
|
||
Timestamp int64 `json:"ts"`
|
||
CPUPercent float64 `json:"cpu"`
|
||
MemUsedMB int `json:"mem_used"`
|
||
MemTotalMB int `json:"mem_total"`
|
||
TempCelsius float64 `json:"temp"`
|
||
LoadAvg1 float64 `json:"load1"`
|
||
LoadAvg5 float64 `json:"load5"`
|
||
LoadAvg15 float64 `json:"load15"`
|
||
DiskUsedGB float64 `json:"disk_used"`
|
||
DiskTotalGB float64 `json:"disk_total"`
|
||
HDDUsedGB float64 `json:"hdd_used"`
|
||
HDDTotalGB float64 `json:"hdd_total"`
|
||
}
|
||
|
||
type ContainerSample struct {
|
||
Timestamp int64 `json:"ts"`
|
||
ContainerName string `json:"name"`
|
||
CPUPercent float64 `json:"cpu"`
|
||
MemUsageMB float64 `json:"mem_usage"`
|
||
MemLimitMB float64 `json:"mem_limit"`
|
||
NetRxBytes int64 `json:"net_rx"`
|
||
NetTxBytes int64 `json:"net_tx"`
|
||
BlockReadBytes int64 `json:"blk_read"`
|
||
BlockWriteBytes int64 `json:"blk_write"`
|
||
}
|
||
|
||
type ContainerCurrentStats struct {
|
||
ContainerName string `json:"name"`
|
||
CPUPercent float64 `json:"cpu_percent"`
|
||
MemUsageMB float64 `json:"mem_usage_mb"`
|
||
MemLimitMB float64 `json:"mem_limit_mb"`
|
||
}
|
||
|
||
type StaticSystemInfo struct {
|
||
Hostname string `json:"hostname"`
|
||
OS string `json:"os"`
|
||
Kernel string `json:"kernel"`
|
||
Architecture string `json:"architecture"`
|
||
CPUModel string `json:"cpu_model"`
|
||
CPUCores int `json:"cpu_cores"`
|
||
UptimeSeconds int64 `json:"uptime_seconds"`
|
||
BootTime time.Time `json:"boot_time"`
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Chart.js Dark Theme Configuration
|
||
|
||
All charts should use a consistent dark theme:
|
||
|
||
```javascript
|
||
const chartDefaults = {
|
||
backgroundColor: 'rgba(0, 136, 204, 0.1)', // accent-blue with low opacity
|
||
borderColor: '#0088cc', // accent-blue
|
||
borderWidth: 2,
|
||
pointRadius: 0, // hide dots (too many points)
|
||
pointHitRadius: 10, // but clickable for tooltips
|
||
tension: 0.3, // smooth curves
|
||
fill: true
|
||
};
|
||
|
||
const chartOptions = {
|
||
responsive: true,
|
||
maintainAspectRatio: false,
|
||
plugins: {
|
||
legend: { display: false },
|
||
tooltip: {
|
||
backgroundColor: '#1c2128',
|
||
titleColor: '#e6edf3',
|
||
bodyColor: '#8b949e',
|
||
borderColor: '#30363d',
|
||
borderWidth: 1
|
||
}
|
||
},
|
||
scales: {
|
||
x: {
|
||
grid: { color: 'rgba(48,54,61,0.5)' },
|
||
ticks: { color: '#8b949e', maxTicksLimit: 8 }
|
||
},
|
||
y: {
|
||
grid: { color: 'rgba(48,54,61,0.5)' },
|
||
ticks: { color: '#8b949e' },
|
||
beginAtZero: true
|
||
}
|
||
}
|
||
};
|
||
```
|
||
|
||
Different chart colors per metric:
|
||
- CPU: `#0088cc` (blue)
|
||
- Memory: `#238636` (green)
|
||
- Temperature: `#d29922` (yellow)
|
||
- Load: `#db6d28` (orange)
|
||
- Container CPU bars: `#0088cc` (blue)
|
||
- Container Memory bars: `#238636` (green)
|
||
|
||
---
|
||
|
||
## Edge Cases
|
||
|
||
- **No metrics yet**: First load after deploy — charts show "Még nincsenek adatok" (No data yet) message
|
||
- **Container appears/disappears**: Containers may start/stop between samples. The summary shows only currently running containers. Historical charts handle gaps gracefully (Chart.js `spanGaps: true`)
|
||
- **Pi 3B+ with 1G RAM**: SQLite + metrics should be lightweight. 60s interval = 1440 system rows/day + ~10k container rows/day (10 containers). With 30-day retention ≈ 350k rows max. SQLite handles this easily.
|
||
- **Docker socket permissions**: `docker stats` already works because the socket is mounted. No new permissions needed.
|
||
- **Timezone**: All timestamps stored as UTC Unix epoch. Chart.js tooltip formatter converts to Europe/Budapest for display.
|
||
- **Chart.js file size**: ~200KB minified. Acceptable for a single page load. Only loaded on the monitoring page.
|
||
|
||
---
|
||
|
||
## Verification Checklist
|
||
|
||
### Backup fixes
|
||
- [ ] "Helyi mentés" shows green ✓ after controller restart (not "–")
|
||
- [ ] Backup page loads in <500ms (cached, no subprocess calls)
|
||
- [ ] Cache refreshes after manual backup ("Mentés most")
|
||
- [ ] Backup page still shows correct data after cache refresh
|
||
|
||
### Monitoring page
|
||
- [ ] Sidebar shows 4 nav items, "Rendszermonitor" is highlighted when active
|
||
- [ ] System overview card shows hostname, OS, kernel, CPU model, cores, uptime
|
||
- [ ] System metrics charts render (CPU, Memory, Temperature, Load)
|
||
- [ ] Time range buttons work (1h/6h/24h/7d/30d) — charts update
|
||
- [ ] Container resource bar charts show current per-container CPU + memory
|
||
- [ ] Clicking a container name shows historical charts
|
||
- [ ] Storage section shows SSD + HDD usage bars
|
||
- [ ] Charts use dark theme matching site design
|
||
- [ ] Page works on mobile (responsive charts)
|
||
- [ ] Metrics accumulate over time (check after 10+ minutes)
|
||
- [ ] SQLite DB created in data volume (survives restart)
|
||
- [ ] Metrics prune job runs daily (check scheduler logs)
|
||
- [ ] No regressions on dashboard, apps, backup pages |