Commit Graph

22 Commits

Author SHA1 Message Date
admin db83db383c fix: deep bug hunt II — concurrency, security & optimization (25 files)
Critical: watchdog mutex panic safety, SetGeoAppOverride nil guard,
SSD-only app DB restore fallback.

High: double deploy race (atomic Deploying flag), delete/remove during
deploy guard, ScanStacks overwrite protection, FileBrowser mount mutex,
PushEvent history, PushOnce error handling, DB dump sync+close before
rename, restic retry fresh context, encrypt failure logging, cross-backup
path traversal validation, deepCopyStack completeness.

Security: constant-time API key comparison, login rate limiting (5/min),
git credential masking in logs, storage path prefix traversal fix.

Concurrency: MigrateEncryption lock ordering, SubdomainInUse I/O outside
lock, scheduler late-registered jobs, SQLite WAL verification, metrics
shutdown context, telemetry scan error logging, asset sync lock scope.

Optimization: streaming file copy for DB dumps, restic stats dedup,
atomic infra config copy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 14:21:09 +01:00
admin 45f75a916c fix: P2+P3 bug fixes, hardening, and cleanup (18 files)
Bug fixes:
- Add applyEnvOverrides to LoadFromBytes (M05)
- Set state=failed on compose-up failure in selfupdate (M16)
- Clamp usableMB to min 0 in memory check (M22)
- Remove "manual" schedule from triggerAllCrossBackups (M23)
- Add mmcblk device handling for partition paths (M21)
- Fix stripPartition for mmcblk devices (L25)
- Fix TruncateStr for UTF-8 and negative maxLen (L05/L06)
- Fix AllDone to return false for empty restore plans (L14)
- Fix PushOnce to return actual errors (L39)
- Restore pending events on save failure in DrainPendingEvents (M03)
- Add duplicate check in AddStoragePath (M04)
- Call CleanupTempMounts after drive scan (H13)
- Log SetStep save errors (M25)

Hardening:
- Guard scheduler Start() against double-start (M14)
- Acquire mutex in scheduler Stop() before reading cancel (L24)
- Cap log lines parameter to 10000 (L31)
- Require POST for logout (L32)
- Use sync.Once for Server.Close() (L49)
- Panic on crypto/rand.Read failure in setup CSRF (L40)
- Validate Bearer token against Hub API key in CSRF (H16 fix)
- Replace custom hasPrefix with strings.HasPrefix (L13)
- Replace simpleHash with crc32.ChecksumIEEE (L48)

Cleanup:
- Remove dead imageName function (L02)
- Remove dead detectHostIPViaRoute function (L03)
- Rename shadowed copy variable to cp (L07)
- Copy DefaultEnabledEvents in GetNotificationPrefs early return (L09)
- Update BUGHUNT.md with comprehensive audit results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 13:47:52 +01:00
admin 2ad743b66f v0.30.2: Report geo-restriction + logo/favicon update + Hub geo auth
- Add GeoRestrictionReport to report types and builder, so Hub can
  display geo-blocking status on customer detail pages
- Update all 5 BuildReport() call sites with new geoRestriction param
- Add /api/geo/ to selfUpdateAuthMiddleware (Hub Bearer token auth)
- Replace embedded logo SVG with updated logo.svg (white text variant)
- Add FelhomFaviconSVG constant + /static/favicon.svg route
- Update layout.html and catchall.html favicon links

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 12:42:51 +01:00
admin 44f7fd2f19 feat: encrypt sensitive values in app.yaml with AES-256-GCM
Passwords and secrets from deploy fields (type: password/secret) are now
encrypted at rest in app.yaml using a per-node 32-byte key. Values stored
as ENC:base64(nonce+ciphertext), decrypted transparently for docker-compose
and web UI. Key included in infra backup bundle for disaster recovery.
Existing plaintext values migrated automatically on startup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:12:24 +01:00
admin e737704e68 fix: skip stopped apps in telemetry to avoid zero-value averages on hub
Deployed-but-stopped apps were included in telemetry reports with all-zero
memory/CPU values, dragging down hub-side averages. Now isStackRunning()
filters to only running/starting/unhealthy/restarting states.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 15:05:39 +01:00
admin 4a6ab4d61c feat(debug): add Telemetria teszt section to debug page (v0.28.1)
- New GET /api/debug/telemetry endpoint runs full telemetry pipeline on-demand
- GetTelemetryPreview callback added to DebugCallbacks, wired in main.go
- BuildAppTelemetryForDebug() exported wrapper in report/telemetry.go
- Debug page: new collapsible section with per-app table (memory, CPU, log errors/warnings, issues) and raw JSON viewer
- Available regardless of hub configuration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 11:09:06 +01:00
admin 05ecd65412 feat(telemetry): add per-app metrics and log telemetry to hub reports (v0.28.0)
- New internal/metrics/telemetry.go: MetricsStore.GetContainerTelemetry()
  aggregates container memory/CPU from SQLite over the last 15 min
- New internal/metrics/logscanner.go: ScanContainerLogs() scans docker logs
  for errors/warnings, deduplicates via fingerprinting (strips timestamps,
  replaces 6+ digit numbers, hex strings, UUIDs)
- New internal/report/telemetry.go: buildAppTelemetrySection() assembles
  per-stack AppTelemetry by aggregating container metrics and log summaries
- internal/report/types.go: added AppTelemetry field to Report struct plus
  AppTelemetry type with memory/CPU/log fields and LogIssue references
- internal/report/builder.go: calls buildAppTelemetrySection() in BuildReport()
- Backward-compatible: old Hub versions silently ignore app_telemetry field

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 10:46:27 +01:00
admin be7803c0ac v0.24.0 — Pre-testing observability: debug logging, diagnostic dump, startup self-test
- Add [DEBUG] logging across all modules (backup, storage, sync, selfupdate,
  monitor, notify, report, assets, setup) gated behind logging.level: "debug"
- Add /api/debug/dump endpoint returning full controller state JSON (debug only)
- Add startup self-test validating 9 subsystems (Docker, dirs, storage, hub,
  restic repos, metrics DB) with pass/warn/fail summary
- New packages: internal/selftest, internal/util
- Constructor/signature changes: debug bool params, logger params on
  RunHealthCheck and BuildReport, smart watchdog probe logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 18:32:26 +01:00
admin 6eb75204b6 v0.22.0: First-run setup wizard, local infra backup, hub verification
New controller features:
- Web-based setup wizard replaces docker-setup.sh interactive config
  - Dual listener: :8080 (Traefik) + :8081 (direct HTTP for LAN)
  - Drive scanner finds .felhom-infra-backup/ on all block devices
  - Hub recovery pull (GET /api/v1/recovery/{id}) with retrieval password
  - Fresh install: Hub config download or manual wizard
  - CSRF protection, state persistence, Hungarian UI
- Local infra backup written to all connected drives after each backup cycle
  - .felhom-infra-backup/backup.json + metadata.json with SHA256 checksum
- Hub verification: parse customer_blocked from report push response
  - Limited mode after 7 days without verification
- Recovery info page on Settings + recovery-info.txt file generation
- Pending events queue: DR events sent to Hub on next report push
- docker-setup.sh v6.0.0: removed interactive wizard, minimal controller.yaml only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 12:33:17 +01:00
admin 8aebbb8902 feat: Hub monitoring takeover — event push system + config cleanup (v0.21.0)
Replace external Healthchecks.io with Hub-native event system. Controller
now pushes structured events via POST /api/v1/event with typed detail
structs. Hub handles dead man's switch, notification dispatch, and cooldowns.

Phase 5: PushEvent() core method, 21 event types, expanded notification
settings (11 toggles), Hub connection monitoring on dashboard, alerts.
Phase 6: Deprecation log for ping UUIDs, pinger kept for transition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 18:53:21 +01:00
admin 85d1f2f673 feat: add config apply endpoint and config hash in reports
- POST /api/config/apply: accepts YAML body from Hub, validates and
  writes controller.yaml atomically (tmp+rename)
- GET /api/config/hash: returns SHA256 hash of current config file
- Report payload now includes config_hash field for Hub comparison
- Config endpoints use same dual auth as self-update (session OR Bearer)
- config.LoadFromBytes() for validation without file I/O
- config.FileHash() helper for SHA256 computation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 16:13:35 +01:00
admin 99bf3ca7a8 feat: drive migration & Tier 2 restic deprecation (v0.18.0)
Phase 1: Deprecate restic as Tier 2 method (rsync only), auto-migrate on startup
Phase 2: Enhanced per-app migration with backup awareness, DB dump copy, auto-cleanup
Phase 3: Full drive migration with decommissioned state, rollback support, wizard UI
Phase 4: Hub report includes decommissioned drive state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:49:14 +01:00
admin bdbe170a54 feat: storage watchdog — USB disconnect detection, auto-stop, safe eject, auto-reconnect (v0.17.0)
New storage watchdog monitors registered storage paths every 5s. On disconnect
(3 consecutive probe failures), auto-stops affected apps, lazy-unmounts stale
VFS entries, fires alerts/notifications/hub report. On reconnect (UUID detected),
auto-remounts via fstab, cleans stale restic locks, offers app restart.

Safe disconnect UI for USB drives: confirmation dialog, stop apps, sync, unmount.
Disconnected state visible across all pages (dashboard, settings, backups, monitoring)
with hatched red bars and badges. Backup guards skip disconnected drives.

22 files changed (1 new: monitor/watchdog.go), ~1500 lines added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 19:42:26 +01:00
admin 2687506b08 feat: add controller_url to hub reports (v0.16.1)
Controller now includes its external URL in periodic hub reports so the
hub can trigger self-updates remotely via the /api/selfupdate/update endpoint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 18:16:32 +01:00
admin 75ea9d73f0 Fix bugs from BUGHUNT.md: restore race conditions, infra backup, DR wiring, docker-setup.sh, restore.html 2026-02-19 14:06:42 +01:00
admin 6713df2186 v0.15.5: Disaster recovery — Hub-based infra backup, auto-mount, restore UI
Complete DR implementation (TASK2.md Phases 1-4):
- Hub infra-backup push/pull endpoints (controller.yaml, disk layout, stacks)
- Fresh-deployment detection pulls config from Hub, auto-mounts drives by UUID
- Full-page restore UI with drive status, app table, sequential restore
- docker-setup.sh shows DR instructions when customer_id is configured

New files: disk_layout.go, restore_scan.go, restore_app_linux.go,
restore_drives_linux.go, infra_backup.go, infra_pull.go,
handler_restore.go, restore.html

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:16:46 +01:00
admin 00c668fc92 v0.15.5: Fix startup hub report — Push() returns real errors, startup retries 3x with 15s delay 2026-02-19 10:08:43 +01:00
admin f54d1a23de v0.15.4: Hub disabled notification, PushOnce, ReportingDisabled field 2026-02-19 09:45:40 +01:00
admin 215ba8a83d v0.15.3: Show all storage paths on dashboard/monitoring + fix hub report 2026-02-19 09:06:59 +01:00
admin aca3b8680a v0.9.0: Storage paths registry, per-app HDD_PATH resolution, storage management UI
- Fix backup toggles not appearing (read each app's own HDD_PATH from app.yaml)
- Storage paths registry in settings.json with auto-discovery from deployed apps
- Settings page "Adattárolók" section with disk usage, add/remove/default/schedulable
- Deploy page path field as dropdown of registered storage paths
- Health check storage monitoring (mount point, disk usage alerts)
- Mount-point validation utilities (Linux syscall + cross-platform stubs)
- Controller docker-compose mount changed to /mnt:/mnt:rw for multi-storage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 09:04:28 +01:00
admin 7d801d1094 Phase 3 complete: per-app backup toggles, restore, storage overview
- Storage overview on backup page (SSD/HDD bars, repo stats)
- Restic password visibility + hub sync for disaster recovery
- App data discovery (HDD bind mounts, Docker volumes)
- Per-app backup toggle checkboxes with settings persistence
- Dynamic backup paths: enabled app HDD data included in restic snapshots
- Limited app restore from snapshots (self-service recovery)
- Snapshots API endpoint for restore dropdown
- Version bump to 0.8.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:29:56 +01:00
admin 97074e7a0c v0.6.0: healthcheck + hub reporting implementation
- Add heartbeat ping (every 5 min, controller alive signal)
- Add backup integrity check (weekly restic check, Sunday 04:00)
- Add Heartbeat + BackupIntegrity fields to PingUUIDsConfig
- Add HubConfig for central hub reporting
- Add report package (types, builder, pusher) for hub push
- Wire hub reporting into scheduler (configurable interval)
- Update controller.yaml.example with new monitoring + hub sections
- Add monitoring/DEPRECATED.md for legacy bash scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 13:19:08 +01:00