Part A of the UI-fixes/storage-spike spec.
A1: enrichHostStorageTargets sorts /api/host-metrics storage_targets
server-side and attaches friendly Hungarian labels + purpose, fixing the
#host-storage-bars reorder-on-poll bug. Display labels only — PVE storage
ids are never renamed.
A2: new GET/POST /stacks/{name}/backup Tier-2 config panel; the "2. mentés"
Beállítás button is repointed there from the dead-end deploy page. Customer
can pin a target drive or disable Tier 2; preference is preserved across the
runner's status writes. Always visible (single-SSD + non-HDD apps included).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4A: scope FileBrowser bind to <drive>/appdata (recovery units + Tier 2 copies under
backups/ are no longer mounted into FileBrowser — customer can't browse/delete the
thing that restores them). 4B: deploy storage-selection step states the chosen drive
holds files while the DB runs on the fast internal SSD + is backed up with the app.
4C: buildStorageBars stable sort + purpose description on the monitoring storage list.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tier 2 rsync-mirrors each HDD app's recovery unit + appdata to a DIFFERENT physical
disk (the only off-drive protection bind-mounted userdata can get; PBS can't reach it).
Auto-enabled, auto-target: prefer another registered drive (different physical disk via
system.SamePhysicalDevice), else the internal SSD for SMALL units only — with a
size-aware headroom guard that REFUSES rather than fill the ~8G guest rootfs, recording
an honest "needs 2nd HDD" status. Status persisted via the surviving CrossDriveBackup;
"2. mentés" UI card now populated. Daily tier2-backup job + POST /api/backup/tier2.
- backup/tier2.go (engine+selection+headroom), tier2_test.go (headroom arithmetic)
- system.SamePhysicalDevice (linux Stat_t.Dev + stub)
- handlers.go Tier2 UI population + tier2DestLabel; backups.html honest no-target reason
- fixed stale TestBackupCopiesOnPath (old felhom-data layout -> in-guest layout)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Demo has no dashboard password (API open: auth+CSRF both skip in that mode), driven
via the public URL. AdventureLog's unit manifest carries data_key_env_vars=[SECRET_KEY]
(catalog->manifest live); with SECRET_KEY unrecoverable, POST /backup/restore REFUSED
with the exact fail-closed message before any compose-up. Full deploy-with-data e2e
blocked by the 8G guest rootfs (AdventureLog images too big — the Phase 3 concern, live).
CHANGELOG/REPORT/CONTEXT updated; demo left clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds an in-process orchestration test for RestoreFromRecoveryUnit: success path
calls recreate with non-secret env + recovered secrets merged; data-key-missing
path is REFUSED and recreate is never called. Makes Manager.isDebug nil-safe
(behavior-neutral in prod; cfg is always set) so the gate/orchestration are testable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Restore recreates an app from its on-drive unit + the guest's own secrets,
regenerating nothing. reconcileRestoreSecrets (pure, unit-tested) merges the unit's
non-secret env with secrets recovered from the live app.yaml and FAILS CLOSED if a
data-encrypting key is unrecoverable (refuse — a PBS whole-guest restore is needed —
rather than regenerate and corrupt). Resettable secrets missing → warn + proceed.
- backup: RestoreFromRecoveryUnit (manifest -> recover secrets -> gate -> restore
volumes -> recreate definition + redeploy w/ re-pull); falls back to volume-only.
- seams: RecoverStackSecrets/RecreateStackFromUnit (adapter +encKey),
stacks.RedeployFromEnv. Wired into /backup/restore.
- tests: gate (refuse/proceed/verbatim) + data_key parsing.
Gate + reconcile + data_key parsing unit-tested; capture live-validated (v0.53.1).
Full readable-data e2e vs AdventureLog needs the auth-gated dashboard restore — pending.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CaptureRecoveryUnit now builds content in memory and skips writes when the unit
is already current (checksum + dump-set + version), so it can run from RefreshCache
(startup + every 5m) without thrashing the USB drive. Units now exist shortly after
startup and track config changes without waiting for the daily DB dump. +idempotency test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
REPORT.md overwritten with the Phase-1 gate run (catalog template fix + agreement
test + live RomM migration on guest 9201, gate PASSED). CONTEXT.md dated entry.
README HDD_PATH/felhom-data convention note corrected for Model-A single-nesting.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The deploy-side double-nest fix lives in the app catalog (templates dropped the
extra felhom-data segment). This adds the controller-side invariant test that
ties the deploy path (ParseComposeHDDMounts) to the backup path
(AppDataDir/NamespaceRoot) so they can't drift again, plus the v0.52.0 CHANGELOG.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Backups page: whole-guest backup shown as real DR — target label "Biztonsági szerver –
külön hardver (PBS)"; app-data "Távoli mentés" card now reflects the PBS offsite tier
(guestBackupView.Offsite) instead of "nincs beállítva".
- Model-A double-nest fix: appbackup path helpers take a felhom-data NAMESPACE ROOT (no
internal felhom-data join); backup.Manager.namespaceRoot/AppNamespaceRoot resolve
HDD-vs-systemDataPath provenance so a drive-resident app's backups land single-nested
(<drive>/backups/... on the guest = <drive>/felhom-data/backups/... on the host) instead
of .../felhom-data/felhom-data/.... Writes, deletion (GetStackBackupData/RemoveStack/
ProtectedHDDPaths), wipe-warning scan, and export updated coherently; legacy double-nest
dirs kept protected. New appbackup test asserts no doubled segment.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4A: user-data drives are backup-target-eligible (not role-locked) — surfaced in
the drive purpose note. 4B: handleStorageImpact returns backup_copies (apps whose
cross-drive backups live on the drive, via backupCopiesOnPath); the wipe/eject
modal warns they'd be destroyed (stays customer-confirmable — copies redundant).
Cross-drive backup engine remains out of scope. Test: TestBackupCopiesOnPath.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pendingActivationDrives() flags registered drives the agent shows attached but not
live-mounted in the container; settings banner + "Újraindítás most" button →
/api/storage/activate → agentapi.GuestReboot. Batches all pending into one restart.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agentapi GuestAttach(where) → POST /disks/guest-attach; runStorageInit/Attach +
handleStorageRegister call attachIntoGuest after register (best-effort, P3 heals).
Closes Branch A: enrolled drives become usable in the guest, banner clears.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Part 2 of the USB/backup spec. agentapi: StatusResponse.Backup record, DueResponse
age_seconds, RestoreTestStatus(). New "Rendszermentés (teljes mentés)" section
(read-only: last backup/target PBS-vs-local/next-due/restore-test) + "Mentés most"
manual trigger that goes through the quiesce loop (controller owns quiescing):
quiesce.Loop gains mutex + TriggerNow() (single-flight, async). New
/api/guest-backup/{trigger,status} (distinct from apiRouter's /api/backup/*).
App-data rows relabeled under an "Alkalmazás-mentések" divider. Config → slice 10.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
backups.html still referenced .Backup.{RepoStats,LastBackup,ResticSchedule,
NextBackup,PruneSchedule,Retention,SnapshotHistory,LastCheckTime,LastCheckOK} —
fields removed from FullBackupStatus in the 8C de-privileging (disk-tier backup
moved to the agent). Field access on the slimmed struct 500s. Removed the dead
restic/snapshot/repo-stat sections; kept the app-data (DB dumps + per-app) view.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Empirically (staging on 9201): traefik v3 issues a cert from a router-level
tls.domains but NOT from the entrypoint http.tls.domains. So the wildcard moves
to RenderControllerRoute (the always-present anchor): when DNS-01 ACME is
configured it carries tls.certResolver+domains *.<domain>+apex, and every other
router serves that wildcard by SNI (no per-app labels). Reverts v0.42.0's dead
entrypoint-domains + TraefikData.Domain.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
traefik's websecure entrypoint now declares http.tls.domains *.<domain>+apex so
it proactively obtains the wildcard via Cloudflare DNS-01 at startup (cert ready
before first client, every router serves it by SNI). Gated on CFAPIToken (DNS-01).
TraefikData gains Domain; ensureTraefik wires cfg.Customer.Domain.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
containerOnNetwork misread the absent-key '<nil>' as "already attached", so
wireController skipped docker network connect -> traefik 502'd felhom.<domain>.
Now lists network names and matches exactly. Also removed dashboard.html's dead
CrossDrive* block (slice-8C leftover) that 500'd the dashboard via gt <nil> 0,
exposed once v0.41.1 made the dashboard reachable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EnsureBaseStack now writes a traefik file-provider route
(Host(felhom.<domain>) -> http://felhom-controller:8080) and joins the
controller to traefik-public. Done post-pull (domain known) and idempotently
(write-if-changed + skip-if-connected), so felhom.<domain> reaches the
controller. Completes the v0.41.0 base-infra bring-up.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New internal/infra package renders traefik/cloudflared/filebrowser from config
(pinned images, single source of truth; web filebrowser path delegates here).
stacks.EnsureBaseStack deploys the traefik-public network + the three stacks,
single-flight + idempotent + non-fatal; wired to first boot and every health
tick. monitor.EffectiveProtected drops cloudflared when no tunnel token.
Section-G fix lives in felhom-agent build-golden.sh (same-path stacks bind).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix the onboarding 401: instead of seeding controller.yaml from the agent's
HOST hub key (which the hub's customer-scoped /api/v1/report rejects), the
controller now PULLS its full controller.yaml from the hub on first boot using
the bootstrap's retrieval passphrase (yielding the customer-scoped key) and
MERGES in the per-guest local_api block.
- internal/bootstrap: contract v1->v2 (customer.id + hub.url +
hub.retrieval_password + local_api; drop host key/identity). MaybeIngest gains
an injected PullFunc (keeps bootstrap free of the heavy report package),
pulls with bounded transient-only retry, merges local_api at YAML-map level
(preserves all hub-emitted fields), idempotent + fail-safe + never-crash.
- main.go: wire report.PullConfig as the pull adapter (maps ErrHubUnreachable
-> ErrPullTransient; auth/not-found permanent).
- Lockstep with felhom-agent v0.19.0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove five orphaned HTML templates left behind when slice 8C retired the
disk/storage/restore web handlers (storage_handlers.go, handler_restore.go and
the /api/storage/* + /api/restore/* routes): storage_init, storage_attach,
migrate, migrate_drive, restore. Zero .go references, zero cross-template
references, no route, no nav entry; embed is a glob so deletion is safe (14
templates remain, build + tests green). No behaviour change; the deleted pages
were already unreachable.
Also ships the live demo validation (v0.39.0) writeup in REPORT.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add agentapi HostMetrics() + a thin /api/host-metrics proxy to the agent's
new GET /host/metrics, and a 'Szerver allapota (gazdagep)' card on the
monitoring page rendering host CPU%/load/mem/CPU-temp(n/a)/uptime + per-
storage capacity bars (thin-pool fill, disk temp/wear). Polls every 8s.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Quiesce loop resumes (StartStack + clear marker) at the snapshotted phase
instead of done -> downtime whole-backup -> until-snapshot, no consistency loss.
Keeps polling to done/failed (no overlapping backup; post-snapshot failure
observed). Stop-mode fallback to done + crash-safety preserved.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Dropped privileged:true + /mnt rshared + /sys + /dev + /etc/fstab + /run/udev
from the bare-metal compose template (controller no longer does disk ops). The
golden bootstrap run was already minimal (8A). Slice 8 CLOSED on the controller.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>