35 KiB
TASK.md — v0.5.0: Backup Bugfixes + Monitoring Page with Metrics Store
Version bump: v0.5.0 Scope: 2 backup fixes + new metrics subsystem + new monitoring page
Overview
- Bugfix: "Helyi mentés" shows "–" after controller restart (in-memory
LastBackuplost) - Performance: Backup page caching —
GetFullStatus()calls restic/docker on every page load (3-4s) - Feature: New Rendszermonitor (System Monitor) page with historical metrics stored in SQLite, rendered with Chart.js
Task 1: Fix "Helyi mentés" status after restart
Problem
After controller restart, LastBackup is nil (in-memory only). The template checks {{if .Backup.LastBackup}} and falls through to the "–" state. However, snapshotHistory IS loaded on startup from LoadSnapshotHistory() — so we know backups exist.
Fix
In GetFullStatus(), after copying snapshot history, synthesize a LastBackup if it's nil but snapshots exist:
// After m.mu.Unlock() and snapshot reversal...
// Synthesize LastBackup from snapshot history if not in memory (e.g., after restart)
if status.LastBackup == nil && len(status.SnapshotHistory) > 0 {
latest := status.SnapshotHistory[0] // already reversed, newest first
status.LastBackup = &BackupStatus{
LastRun: latest.Time,
Success: latest.Success,
Snapshot: &SnapshotResult{
SnapshotID: latest.SnapshotID,
},
}
}
Also do the same for LastDBDump — synthesize from DumpFiles on disk:
if status.LastDBDump == nil && len(status.DumpFiles) > 0 {
var results []DumpResult
var latestTime time.Time
for _, f := range status.DumpFiles {
results = append(results, DumpResult{
DB: DiscoveredDB{StackName: f.StackName, DBType: f.DBType, ContainerName: f.StackName},
FilePath: f.FileName,
Size: f.Size,
})
if f.ModTime.After(latestTime) {
latestTime = f.ModTime
}
}
status.LastDBDump = &DBDumpStatus{
LastRun: latestTime,
Results: results,
Success: true,
}
}
Verification
Restart the controller (docker compose restart), then load the backup page. "Helyi mentés" should show green checkmark "aktív" (not "–").
Task 2: Backup page caching
Problem
GetFullStatus() runs restic stats --json, restic snapshots --json, docker ps, and docker inspect synchronously on every page load of /backups. Takes 3-4 seconds.
Fix: Background cache with periodic refresh
<!!! Should be already implemented, verify only !!!>
Add cached status to Manager:
type Manager struct {
// ... existing fields ...
cachedStatus *FullBackupStatus
cacheTime time.Time
}
New method:
// RefreshCache updates the cached backup status in the background.
// Called by scheduler every 5 minutes and after each backup run.
func (m *Manager) RefreshCache(nextDBDump, nextBackup time.Time) {
// Execute all the expensive calls (restic stats, docker ps, etc.)
// Same logic currently in GetFullStatus()...
status := &FullBackupStatus{ ... }
m.mu.Lock()
m.cachedStatus = status
m.cacheTime = time.Now()
m.mu.Unlock()
}
Modified GetFullStatus() — reads from cache, updates only cheap dynamic fields:
func (m *Manager) GetFullStatus(nextDBDump, nextBackup time.Time) *FullBackupStatus {
m.mu.Lock()
defer m.mu.Unlock()
if m.cachedStatus != nil {
// Update dynamic fields without subprocess calls
m.cachedStatus.Running = m.running
m.cachedStatus.NextDBDump = nextDBDump
m.cachedStatus.NextBackup = nextBackup
m.cachedStatus.LastDBDump = m.lastDBDump
m.cachedStatus.LastBackup = m.lastBackup
m.cachedStatus.SnapshotHistory = make([]SnapshotRecord, len(m.snapshotHistory))
copy(m.cachedStatus.SnapshotHistory, m.snapshotHistory)
// Apply snapshot reversal + LastBackup synthesis (from Task 1)
return m.cachedStatus
}
// No cache yet — return minimal status (before first refresh completes)
return &FullBackupStatus{
Enabled: m.cfg.Backup.Enabled,
Running: m.running,
LastDBDump: m.lastDBDump,
LastBackup: m.lastBackup,
DBDumpSchedule: m.cfg.Backup.DBDumpSchedule,
ResticSchedule: m.cfg.Backup.ResticSchedule,
PruneSchedule: m.cfg.Backup.PruneSchedule,
NextDBDump: nextDBDump,
NextBackup: nextBackup,
Retention: m.cfg.Backup.Retention,
RepoPath: m.cfg.Backup.ResticRepo,
BackupPaths: []string{m.cfg.Paths.StacksDir, m.cfg.Paths.DBDumpDir, "/opt/docker/felhom-controller/controller.yaml"},
}
}
Scheduler + lifecycle integration
In main.go:
// Register cache refresh job (every 5 min)
sched.Every("backup-cache", 5*time.Minute, func(ctx context.Context) error {
nextDBDump := scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule)
nextBackup := scheduler.NextDailyRun(cfg.Backup.ResticSchedule)
backupMgr.RefreshCache(nextDBDump, nextBackup)
return nil
})
// Initial cache population (non-blocking)
go func() {
backupMgr.RefreshCache(
scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule),
scheduler.NextDailyRun(cfg.Backup.ResticSchedule),
)
}()
At the end of RunFullBackup() and RunBackup(), call RefreshCache() so the page shows updated data immediately.
Import note: RefreshCache takes nextDBDump, nextBackup time.Time as params to avoid circular import with scheduler package.
Add CacheTime to FullBackupStatus
type FullBackupStatus struct {
// ... existing ...
CacheTime time.Time // when cache was last refreshed
}
Template can optionally show staleness at the bottom of the page in muted text.
Task 3: Monitoring Page — Metrics Store + Charts
Architecture
┌──────────────────────────┐
│ felhom-controller │
│ │
/proc/stat ────────────►│ MetricsCollector │
/proc/meminfo ─────────►│ (every 60s) │
/sys/thermal ──────────►│ ↓ │
docker stats ──────────►│ SQLite DB │
│ (/data/metrics.db) │
│ ↓ │
│ REST API │
│ /api/metrics/* │
│ ↓ │
│ Chart.js in browser │
└──────────────────────────┘
No new containers. Everything runs inside the existing controller.
3A: SQLite Metrics Store (internal/metrics/store.go)
Database: /opt/docker/felhom-controller/data/metrics.db (inside the controller-data volume — persists across restarts)
Tables:
-- System-wide metrics (1 row per sample)
CREATE TABLE IF NOT EXISTS system_metrics (
ts INTEGER NOT NULL, -- Unix timestamp
cpu_percent REAL NOT NULL,
mem_used_mb INTEGER NOT NULL,
mem_total_mb INTEGER NOT NULL,
temp_celsius REAL,
load_avg_1 REAL,
load_avg_5 REAL,
load_avg_15 REAL,
disk_used_gb REAL,
disk_total_gb REAL,
hdd_used_gb REAL,
hdd_total_gb REAL
);
CREATE INDEX IF NOT EXISTS idx_system_ts ON system_metrics(ts);
-- Per-container metrics (1 row per container per sample)
CREATE TABLE IF NOT EXISTS container_metrics (
ts INTEGER NOT NULL, -- Unix timestamp
container_name TEXT NOT NULL,
cpu_percent REAL NOT NULL,
mem_usage_mb REAL NOT NULL,
mem_limit_mb REAL,
net_rx_bytes INTEGER,
net_tx_bytes INTEGER,
block_read_bytes INTEGER,
block_write_bytes INTEGER
);
CREATE INDEX IF NOT EXISTS idx_container_ts ON container_metrics(ts);
CREATE INDEX IF NOT EXISTS idx_container_name ON container_metrics(container_name, ts);
Go struct:
type MetricsStore struct {
db *sql.DB
logger *log.Logger
}
func NewMetricsStore(dbPath string, logger *log.Logger) (*MetricsStore, error)
func (s *MetricsStore) Close() error
func (s *MetricsStore) InsertSystemMetrics(m SystemSample) error
func (s *MetricsStore) InsertContainerMetrics(samples []ContainerSample) error
func (s *MetricsStore) QuerySystemMetrics(from, to time.Time, resolution int) ([]SystemSample, error)
func (s *MetricsStore) QueryContainerMetrics(name string, from, to time.Time, resolution int) ([]ContainerSample, error)
func (s *MetricsStore) QueryContainerSummary() ([]ContainerCurrentStats, error)
func (s *MetricsStore) Prune(olderThan time.Duration) (int64, error)
Resolution/downsampling: The resolution parameter controls how many data points to return. For example, resolution=100 means "return ~100 points, averaging intermediate samples". Implementation: compute bucket size as (to-from)/resolution, group by ts/bucketSeconds, average the values.
Auto-prune: Delete rows older than 30 days. Run daily via scheduler.
SQLite pragmas on open:
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
PRAGMA busy_timeout=5000;
WAL mode is critical — allows concurrent reads (page loads) while writer (collector) inserts. Without WAL, page loads would block during writes.
Dependencies: Use modernc.org/sqlite (pure Go, no CGO — works in the existing Alpine-based Docker image without extra libs). Add to go.mod:
go get modernc.org/sqlite
If modernc.org/sqlite is problematic (large binary size increase), alternative: use github.com/mattn/go-sqlite3 but this requires CGO and the Dockerfile would need gcc + musl-dev in the build stage. Prefer modernc.org/sqlite for simplicity.
3B: Metrics Collector (internal/metrics/collector.go)
A single goroutine that samples both system and container metrics every 60 seconds:
type MetricsCollector struct {
store *MetricsStore
cpuCollector *system.CPUCollector
hddPath string
logger *log.Logger
cancel context.CancelFunc
}
func NewMetricsCollector(store *MetricsStore, cpuCollector *system.CPUCollector, hddPath string, logger *log.Logger) *MetricsCollector
func (c *MetricsCollector) Start(ctx context.Context)
func (c *MetricsCollector) Stop()
System sampling (reuse existing functions):
func (c *MetricsCollector) sampleSystem() SystemSample {
info := system.GetInfo(c.hddPath, c.cpuCollector)
return SystemSample{
Timestamp: time.Now().Unix(),
CPUPercent: info.CPUPercent,
MemUsedMB: int(info.UsedMemMB),
MemTotalMB: int(info.TotalMemMB),
TempCelsius: info.TemperatureCelsius,
LoadAvg1: info.LoadAvg1,
LoadAvg5: info.LoadAvg5,
LoadAvg15: info.LoadAvg15,
DiskUsedGB: info.DiskUsedGB,
DiskTotalGB: info.DiskTotalGB,
HDDUsedGB: info.HDDUsedGB,
HDDTotalGB: info.HDDTotalGB,
}
}
Container sampling via docker stats --no-stream:
func (c *MetricsCollector) sampleContainers() []ContainerSample {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
cmd := exec.CommandContext(ctx, "docker", "stats", "--no-stream",
"--format", "{{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}")
out, err := cmd.Output()
// Parse each line...
}
Parse the output fields. docker stats returns values like:
- CPU%: "2.50%" → parse as float 2.50
- MemUsage: "150.5MiB / 512MiB" → parse numerator as MB
- NetIO: "1.5MB / 2.3MB" → parse rx/tx bytes
- BlockIO: "50MB / 10MB" → parse read/write bytes
Filter out: Skip infrastructure containers if desired (felhom-controller itself — avoid self-referential metrics noise). Actually, include all containers including infra — useful for debugging.
Collector loop:
func (c *MetricsCollector) loop(ctx context.Context) {
ticker := time.NewTicker(60 * time.Second)
defer ticker.Stop()
// Sample immediately on start
c.sample()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
c.sample()
}
}
}
func (c *MetricsCollector) sample() {
sys := c.sampleSystem()
if err := c.store.InsertSystemMetrics(sys); err != nil {
c.logger.Printf("[WARN] Failed to store system metrics: %v", err)
}
containers := c.sampleContainers()
if err := c.store.InsertContainerMetrics(containers); err != nil {
c.logger.Printf("[WARN] Failed to store container metrics: %v", err)
}
}
3C: System Info Provider (internal/metrics/sysinfo.go)
Static system information (doesn't change between samples):
type StaticSystemInfo struct {
Hostname string
OS string // e.g., "Debian GNU/Linux 12 (bookworm)"
Kernel string // e.g., "6.1.0-18-amd64"
Architecture string // e.g., "x86_64"
CPUModel string // e.g., "Intel N100"
CPUCores int // e.g., 4
Uptime time.Duration
UptimeSince time.Time
}
func GetStaticInfo() StaticSystemInfo
Read from:
- Hostname:
os.Hostname()or/etc/hostname - OS:
/etc/os-release→PRETTY_NAME - Kernel:
/proc/versionoruname -r(read/proc/sys/kernel/osrelease) - Architecture:
runtime.GOARCH(but for display, read/proc/cpuinfoor useuname -m) - CPU model:
/proc/cpuinfo→model namefield - CPU cores:
/proc/cpuinfo→ countprocessorlines, orruntime.NumCPU() - Uptime:
/proc/uptime→ first field (seconds since boot)
IMPORTANT: These are read from inside the container. /proc/uptime, /proc/cpuinfo, and /proc/version reflect the host (not the container) because they're from the host kernel. /etc/os-release inside the container shows the container's OS (Debian), not the host's. For displaying host OS:
- Mount
/etc/os-releaseread-only from host: add to docker-compose.yml - Or: Read from
/host/etc/os-releasewith a/etc:/host/etc:romount
Decision: Add /etc/os-release:/host/etc/os-release:ro to docker-compose.yml. Read host OS from /host/etc/os-release, fall back to container's /etc/os-release.
3D: REST API Endpoints (internal/api/router.go)
New endpoints:
GET /api/metrics/system?range=24h&resolution=200
GET /api/metrics/system?from=2026-02-15T00:00:00Z&to=2026-02-16T00:00:00Z&resolution=200
GET /api/metrics/containers/summary
GET /api/metrics/containers/{name}?range=7d&resolution=200
GET /api/metrics/sysinfo
Range presets: 1h, 6h, 24h, 7d, 30d — parsed in the handler and converted to from/to timestamps.
Response format for system metrics:
{
"ok": true,
"data": {
"labels": [1708041600, 1708041660, ...],
"cpu": [5.2, 4.8, 6.1, ...],
"memory": [3200, 3250, 3180, ...],
"temp": [42, 41, 43, ...],
"load1": [0.3, 0.2, 0.4, ...]
}
}
Flat arrays are more efficient for Chart.js than arrays of objects.
Response format for container summary:
{
"ok": true,
"data": [
{"name": "immich-server", "cpu_percent": 2.5, "mem_usage_mb": 350, "mem_limit_mb": 2048},
{"name": "paperless-webserver", "cpu_percent": 0.1, "mem_usage_mb": 280, "mem_limit_mb": 1024}
]
}
3E: Monitoring Page Template (internal/web/templates/monitoring.html)
Four sections:
Section 1: Rendszer áttekintés (System Overview)
Static system info card:
╔══════════════════════════════════════════════════════╗
║ Rendszer áttekintés ║
╠══════════════════════════════════════════════════════╣
║ ║
║ Gépnév: demo-felhom ║
║ Operációs rendszer: Debian GNU/Linux 12 (bookworm) ║
║ Kernel: 6.1.0-18-amd64 ║
║ Processzor: Intel N100 (4 mag) ║
║ Üzemidő: 15 nap, 3 óra ║
║ Indítás: 2026-02-01 05:12 ║
║ ║
╚══════════════════════════════════════════════════════╝
Hungarian labels:
- "Rendszer áttekintés" = System overview
- "Gépnév" = Hostname
- "Operációs rendszer" = Operating system
- "Kernel" = Kernel
- "Processzor" = Processor
- "mag" = cores
- "Üzemidő" = Uptime
- "nap" = days, "óra" = hours
- "Indítás" = Started at
Section 2: Rendszer metrikák (System Metrics) — Charts
Time range selector (pill buttons like filter bar):
[ 1 óra ] [ 6 óra ] [ 24 óra ] [ 7 nap ] [ 30 nap ]
Four charts in a 2×2 grid:
┌─────────────────────────┐ ┌─────────────────────────┐
│ CPU használat (%) │ │ Memória használat (GB) │
│ ▁▂▃▂▁▂▄▃▂▁▃▂▁▂▃ │ │ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ │
│ │ │ │
└─────────────────────────┘ └─────────────────────────┘
┌─────────────────────────┐ ┌─────────────────────────┐
│ Hőmérséklet (°C) │ │ Terhelés (Load Average) │
│ ▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁ │ │ ▁▂▃▂▁▂▁▂▃▂▁▂▁▂▁ │
│ │ │ │
└─────────────────────────┘ └─────────────────────────┘
Chart.js line charts with:
- Dark theme styling (match site theme — dark background, light grid, colored lines)
- Tooltips showing exact value + timestamp
- Y-axis: CPU 0-100%, Memory in GB (auto-scale), Temp in °C, Load auto-scale
- X-axis: time labels, format varies by range (HH:MM for 1h-24h, MM-DD for 7d-30d)
- Fill under the line with low-opacity color
- Responsive (resize with container)
Section 3: Alkalmazás erőforrások (Application Resources) — Current snapshot
Bar chart showing per-container resource usage RIGHT NOW:
╔══════════════════════════════════════════════════════════════╗
║ Alkalmazás erőforrások ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ CPU használat Memória használat ║
║ immich-server ████████░░ 8.2% ████████░ 350 MB ║
║ paperless-web ██░░░░░░░░ 1.5% ██████░░░ 280 MB ║
║ immich-ml █░░░░░░░░░ 0.8% ████░░░░░ 180 MB ║
║ romm █░░░░░░░░░ 0.3% ██░░░░░░░ 90 MB ║
║ filebrowser ░░░░░░░░░░ 0.1% █░░░░░░░░ 25 MB ║
║ ║
╚══════════════════════════════════════════════════════════════╝
Two horizontal bar charts side by side (Chart.js horizontal bar). Container names on the Y-axis. Sorted by CPU usage descending.
Each container name is a clickable link that opens the detail view (Section 4 below).
Section 4: Per-container detail (expandable / click-to-show)
When clicking a container name, show a historical chart panel below the bar chart (or in a modal/expanded section):
╔══════════════════════════════════════════════════════════════╗
║ immich-server — Erőforrás előzmények ║
║ [1 óra] [6 óra] [24 óra] [7 nap] ║
║ ║
║ ┌─────────────────────┐ ┌─────────────────────┐ ║
║ │ CPU % (line chart) │ │ Memória MB (line) │ ║
║ └─────────────────────┘ └─────────────────────┘ ║
║ ║
╚══════════════════════════════════════════════════════════════╝
Implementation: When a container name is clicked, JS fetches GET /api/metrics/containers/{name}?range=24h&resolution=150 and renders two Chart.js line charts in an expandable panel below the bar charts.
Section 5: Tárhely (Storage overview)
Disk usage bars for all mounted filesystems:
╔══════════════════════════════════════════════════════════════╗
║ Tárhely ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ SSD (/) ████░░░░░░░░░░░ 17.5 GB / 460 GB (4%) ║
║ Külső HDD (/mnt) ████████░░░░░░░ 500 GB / 1000 GB (50%) ║
║ ║
╚══════════════════════════════════════════════════════════════╝
Reuse the progress bar styling from the dashboard's system info card.
3F: Chart.js Integration
CDN import: Include Chart.js from CDN in the monitoring page template (not site-wide — only needed on this page):
<script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.7/dist/chart.umd.min.js"></script>
Wait — the controller runs on customer hardware that may not have internet access (Cloudflare Tunnel handles external requests, but internal pages are loaded locally). The Chart.js library must be embedded or served locally.
Options:
- Download
chart.umd.min.js(~200KB) and embed it via//go:embedin the binary, serve at/static/chart.min.js - Include it in the Docker image at build time
Decision: Download chart.umd.min.js, place it at internal/web/static/chart.min.js, embed it alongside the templates. Serve it at /static/chart.min.js.
Actually, the existing static serving only handles style.css and felhom-logo.svg via hardcoded handlers. Either:
- Add another hardcoded handler for
/static/chart.min.js - Or switch to a proper embedded static FS
For simplicity, add another handler:
case path == "/static/chart.min.js":
s.serveChartJS(w, r)
3G: Docker Compose changes
Add host OS release mount:
volumes:
# ... existing ...
# Host OS info — for monitoring page system info
- /etc/os-release:/host/etc/os-release:ro
3H: Config additions
No new config needed. The metrics collection interval (60s) and retention (30 days) can be hardcoded for v0.5.0. Make them configurable later if needed.
Navigation
Sidebar (layout.html)
Add fourth nav item:
<ul class="nav-links">
<li><a href="/" class="...">Vezérlőpult</a></li>
<li><a href="/stacks" class="...">Alkalmazások</a></li>
<li><a href="/backups" class="...">Biztonsági mentés</a></li>
<li><a href="/monitoring" class="...">Rendszermonitor</a></li>
</ul>
Web route (server.go ServeHTTP)
case path == "/monitoring":
s.monitoringHandler(w, r)
Handler (handlers.go)
func (s *Server) monitoringHandler(w http.ResponseWriter, _ *http.Request) {
data := s.baseData("monitoring", "Rendszermonitor")
data["SysInfo"] = metrics.GetStaticInfo()
data["SystemInfo"] = system.GetInfo(s.cfg.Paths.HDDPath, s.cpuCollector)
s.render(w, "monitoring", data)
}
The page itself is mostly JS-driven — static info is server-rendered, charts fetch data via API calls after page load.
Scheduler Integration
New jobs in main.go:
// Metrics collection — every 60s
sched.Every("metrics-collect", 60*time.Second, func(ctx context.Context) error {
metricsCollector.Sample()
return nil
})
// Metrics pruning — daily at 04:00
sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error {
deleted, err := metricsStore.Prune(30 * 24 * time.Hour)
if err != nil {
return err
}
logger.Printf("[INFO] Pruned %d old metric rows", deleted)
return nil
})
Note: The collector could run its own internal ticker, but using the scheduler is cleaner and consistent with the rest of the codebase. However, 60s is right at the scheduler's "quiet mode" boundary (30s). Make sure it doesn't spam logs — set quiet mode threshold to 60s or make the metrics job quiet explicitly.
Actually, better approach: have the collector run its own internal ticker (like CPUCollector does), not via the scheduler. The scheduler is designed for heavier tasks. The collector is a lightweight background loop. Register prune-only via scheduler.
metricsCollector.Start(ctx) // starts internal 60s ticker
defer metricsCollector.Stop()
sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error { ... })
Implementation Order
Step 1: SQLite metrics store
- Add
modernc.org/sqlitetogo.mod - Create
internal/metrics/store.go— schema, CRUD, prune, query with downsampling - Write basic tests if convenient
Step 2: Static system info
- Create
internal/metrics/sysinfo.go— read hostname, OS, kernel, CPU, uptime - Handle Docker mount of
/etc/os-release
Step 3: Metrics collector
- Create
internal/metrics/collector.go— system + container sampling - Parse
docker stats --no-streamoutput - Start collector in
main.go, register prune job
Step 4: REST API endpoints
- Add
/api/metrics/*routes torouter.go - Implement handlers: system metrics, container summary, container history, sysinfo
- Wire
MetricsStoreinto API router
Step 5: Chart.js embedding
- Download
chart.umd.min.js(v4.4.x) - Place in
internal/web/static/(or alongside templates) - Add serving handler in
server.go
Step 6: Monitoring page template + CSS
- Create
internal/web/templates/monitoring.html— all 5 sections - Add JavaScript for Chart.js rendering, time range switching, container detail expand
- Add CSS styles to
style.css - Update
layout.html— add sidebar nav item
Step 7: Backup page fixes (Tasks 1 + 2)
- Fix "Helyi mentés" in
GetFullStatus()— synthesize from snapshot history - Add caching:
RefreshCache(), modifyGetFullStatus(), register scheduler job - Add initial cache goroutine in
main.go
Step 8: Docker Compose changes
- Add
/etc/os-release:/host/etc/os-release:romount - Update
controller.yaml.exampleif needed
Step 9: Build, deploy, verify
- Build v0.5.0
- Deploy to demo node (sync full docker-compose.yml)
- Verify backup page loads instantly
- Verify "Helyi mentés" shows green after restart
- Verify monitoring page renders with system info
- Wait 5 minutes for metrics to accumulate
- Verify charts render with data
- Verify container bar charts show current usage
- Click a container → verify historical chart loads
- Test time range switching (1h/6h/24h/7d/30d)
Step 10: Documentation
- Update CONTEXT.md, README
- Bump version
Files to create
internal/metrics/store.go — SQLite metrics store
internal/metrics/collector.go — System + container metrics collector
internal/metrics/sysinfo.go — Static system info (OS, kernel, CPU, uptime)
internal/metrics/types.go — Shared types (SystemSample, ContainerSample, etc.)
internal/web/templates/monitoring.html — Monitoring page template
internal/web/static/chart.min.js — Chart.js library (embedded)
Note: chart.min.js can be placed alongside templates if using a shared //go:embed directive, or in a separate static/ embed. Check existing embed patterns in embed.go.
Files to modify
internal/backup/backup.go — Fix Helyi mentés synthesis + add caching
internal/api/router.go — Add /api/metrics/* endpoints
internal/web/server.go — Add /monitoring route, /static/chart.min.js handler, accept metricsStore
internal/web/handlers.go — Add monitoringHandler()
internal/web/templates/layout.html — Add sidebar nav item
internal/web/templates/style.css — Monitoring page styles (charts, tables, bars)
internal/web/embed.go — Include chart.min.js in embedded FS (if needed)
cmd/controller/main.go — Wire MetricsStore + collector, backup cache job, initial cache goroutine
controller/docker-compose.yml — Add /etc/os-release mount
go.mod — Add modernc.org/sqlite dependency
Dockerfile — May need adjustments for SQLite (verify modernc.org/sqlite works with current build)
Data Types Reference
// internal/metrics/types.go
type SystemSample struct {
Timestamp int64 `json:"ts"`
CPUPercent float64 `json:"cpu"`
MemUsedMB int `json:"mem_used"`
MemTotalMB int `json:"mem_total"`
TempCelsius float64 `json:"temp"`
LoadAvg1 float64 `json:"load1"`
LoadAvg5 float64 `json:"load5"`
LoadAvg15 float64 `json:"load15"`
DiskUsedGB float64 `json:"disk_used"`
DiskTotalGB float64 `json:"disk_total"`
HDDUsedGB float64 `json:"hdd_used"`
HDDTotalGB float64 `json:"hdd_total"`
}
type ContainerSample struct {
Timestamp int64 `json:"ts"`
ContainerName string `json:"name"`
CPUPercent float64 `json:"cpu"`
MemUsageMB float64 `json:"mem_usage"`
MemLimitMB float64 `json:"mem_limit"`
NetRxBytes int64 `json:"net_rx"`
NetTxBytes int64 `json:"net_tx"`
BlockReadBytes int64 `json:"blk_read"`
BlockWriteBytes int64 `json:"blk_write"`
}
type ContainerCurrentStats struct {
ContainerName string `json:"name"`
CPUPercent float64 `json:"cpu_percent"`
MemUsageMB float64 `json:"mem_usage_mb"`
MemLimitMB float64 `json:"mem_limit_mb"`
}
type StaticSystemInfo struct {
Hostname string `json:"hostname"`
OS string `json:"os"`
Kernel string `json:"kernel"`
Architecture string `json:"architecture"`
CPUModel string `json:"cpu_model"`
CPUCores int `json:"cpu_cores"`
UptimeSeconds int64 `json:"uptime_seconds"`
BootTime time.Time `json:"boot_time"`
}
Chart.js Dark Theme Configuration
All charts should use a consistent dark theme:
const chartDefaults = {
backgroundColor: 'rgba(0, 136, 204, 0.1)', // accent-blue with low opacity
borderColor: '#0088cc', // accent-blue
borderWidth: 2,
pointRadius: 0, // hide dots (too many points)
pointHitRadius: 10, // but clickable for tooltips
tension: 0.3, // smooth curves
fill: true
};
const chartOptions = {
responsive: true,
maintainAspectRatio: false,
plugins: {
legend: { display: false },
tooltip: {
backgroundColor: '#1c2128',
titleColor: '#e6edf3',
bodyColor: '#8b949e',
borderColor: '#30363d',
borderWidth: 1
}
},
scales: {
x: {
grid: { color: 'rgba(48,54,61,0.5)' },
ticks: { color: '#8b949e', maxTicksLimit: 8 }
},
y: {
grid: { color: 'rgba(48,54,61,0.5)' },
ticks: { color: '#8b949e' },
beginAtZero: true
}
}
};
Different chart colors per metric:
- CPU:
#0088cc(blue) - Memory:
#238636(green) - Temperature:
#d29922(yellow) - Load:
#db6d28(orange) - Container CPU bars:
#0088cc(blue) - Container Memory bars:
#238636(green)
Edge Cases
- No metrics yet: First load after deploy — charts show "Még nincsenek adatok" (No data yet) message
- Container appears/disappears: Containers may start/stop between samples. The summary shows only currently running containers. Historical charts handle gaps gracefully (Chart.js
spanGaps: true) - Pi 3B+ with 1G RAM: SQLite + metrics should be lightweight. 60s interval = 1440 system rows/day + ~10k container rows/day (10 containers). With 30-day retention ≈ 350k rows max. SQLite handles this easily.
- Docker socket permissions:
docker statsalready works because the socket is mounted. No new permissions needed. - Timezone: All timestamps stored as UTC Unix epoch. Chart.js tooltip formatter converts to Europe/Budapest for display.
- Chart.js file size: ~200KB minified. Acceptable for a single page load. Only loaded on the monitoring page.
Verification Checklist
Backup fixes
- "Helyi mentés" shows green ✓ after controller restart (not "–")
- Backup page loads in <500ms (cached, no subprocess calls)
- Cache refreshes after manual backup ("Mentés most")
- Backup page still shows correct data after cache refresh
Monitoring page
- Sidebar shows 4 nav items, "Rendszermonitor" is highlighted when active
- System overview card shows hostname, OS, kernel, CPU model, cores, uptime
- System metrics charts render (CPU, Memory, Temperature, Load)
- Time range buttons work (1h/6h/24h/7d/30d) — charts update
- Container resource bar charts show current per-container CPU + memory
- Clicking a container name shows historical charts
- Storage section shows SSD + HDD usage bars
- Charts use dark theme matching site design
- Page works on mobile (responsive charts)
- Metrics accumulate over time (check after 10+ minutes)
- SQLite DB created in data volume (survives restart)
- Metrics prune job runs daily (check scheduler logs)
- No regressions on dashboard, apps, backup pages