Files

T

admin 6713df2186 v0.15.5: Disaster recovery — Hub-based infra backup, auto-mount, restore UI

Complete DR implementation (TASK2.md Phases 1-4):
- Hub infra-backup push/pull endpoints (controller.yaml, disk layout, stacks)
- Fresh-deployment detection pulls config from Hub, auto-mounts drives by UUID
- Full-page restore UI with drive status, app table, sequential restore
- docker-setup.sh shows DR instructions when customer_id is configured

New files: disk_layout.go, restore_scan.go, restore_app_linux.go,
restore_drives_linux.go, infra_backup.go, infra_pull.go,
handler_restore.go, restore.html

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-19 13:16:46 +01:00

52 KiB

Raw Blame History

TASK2: Disaster Recovery — Hub-Based Infrastructure Restore

Overview

Add the ability to fully restore a Felhom deployment after a system drive failure. The controller pushes an infrastructure snapshot to the central Hub during each backup cycle. When a fresh controller is deployed on a replacement system, it pulls the snapshot from the Hub, auto-mounts surviving drives using stored disk UUIDs, and restores all applications and their data.

This is a phased implementation:

Phase	Scope	Where	Status
Phase 1	Hub infra-backup endpoints + controller push	Hub + Controller	DONE
Phase 2	New-deployment detection + Hub pull + auto-mount	Controller	DONE
Phase 3	Restore UI + app data restoration	Controller	DONE
Phase 4	docker-setup.sh integration	Script	DONE

Phases 1-2 can be deployed independently. Phase 3 depends on Phase 2. Phase 4 depends on Phase 1 (needs Hub endpoints).

Phase 1 — What was deployed

Hub changes (e:/git/felhom.eu/hub/):

internal/store/store.go — new infra_backups table (CREATE TABLE in migrate()), SaveInfraBackup(), GetInfraBackup(), GetInfraBackupMeta() + InfraBackupMeta struct
internal/api/handler.go — POST /api/v1/infra-backup (push) + GET /api/v1/infra-backup/{customer_id} (pull), both with Bearer auth
internal/web/server.go — handleCustomerDetail() loads InfraBackupMeta and passes to template
internal/web/templates/customer.html — "Infra Backup" card showing last-updated age, stack count, disk count

Controller changes (controller/):

internal/settings/settings.go — new GetCrossDriveResticPassword() read-only getter
internal/report/infra_backup.go — InfraBackup, DiskLayout, DiskMount, InfraStack types + BuildInfraBackup() builder
internal/report/infra_backup_linux.go — collectDiskLayout() parses /host-fstab + blkid/lsblk for disk topology
internal/report/infra_backup_other.go — no-op stub for non-Linux compilation
internal/report/pusher.go — PushInfraBackup() method (3 retries, 5s backoff)
cmd/controller/main.go — pushInfraBackup() helper; called after nightly backup cycle and on startup; hubPusher declaration moved earlier for closure access

Phase 2 — What was deployed

Controller changes (controller/):

internal/backup/disk_layout.go — NEW — DiskLayout and DiskMount types (moved from report to avoid circular import: report→backup, backup→report)
internal/report/infra_backup.go — updated DiskLayout field to use backup.DiskLayout
internal/report/infra_backup_linux.go — updated to return backup.DiskLayout
internal/report/infra_backup_other.go — updated to return backup.DiskLayout
internal/report/infra_pull.go — NEW — PullInfraBackup(hubURL, apiKey, customerID) HTTP GET from Hub, returns *InfraBackup or nil/nil for 404
internal/backup/restore_drives_linux.go — NEW — MountDrivesFromLayout(ctx, layout, logger) scans block devices by UUID, mounts using two-layer pattern (raw+bind), updates /host-fstab; includes scanBlockDeviceUUIDs() (lsblk+blkid), mountDirect(), mountRawAndBind(), addDRFstabEntries(), isMountedPath(), hostDevPath()
internal/backup/restore_drives_other.go — NEW — no-op stub for non-Linux compilation
internal/settings/settings.go — added SetCrossDriveResticPassword(password) setter (RWMutex + atomic save)
cmd/controller/main.go — added fresh-deployment detection (!fileExists(settings.json)), Hub pull, password restoration, settings restoration, drive mounting (with 2min timeout), settings re-load after restore; helper functions: fileExists(), restorePasswordsFromHub(), restoreSettingsFromHub()

Phase 3 — Implementation plan

Context: After Phase 2, drives are mounted and local backup data is accessible. The Hub infra backup has the deployed_stacks manifest and cross-drive backup data lives at <drive>/backups/secondary/<app>/rsync/ with _config/ and _db/ subdirs.

Key insight: In the common DR scenario (system drive died, HDDs survived), app data is already on the HDD. The main thing to restore is stack configs (compose files + app.yaml with deployed flag + env vars). Cross-drive rsync backups include _config/ which has the full stack directory.

Files (NEW):

internal/backup/restore_scan.go — RestorePlan, RestorableApp types + ScanDrivesForBackups() + BuildRestorePlan()
internal/backup/restore_app_linux.go — RestoreAppFromBackup() (restore config + data + DB dump + docker compose up)
internal/backup/restore_app_other.go — non-Linux stub
internal/web/handler_restore.go — restore page handler + JSON API endpoints
internal/web/templates/restore.html — full-page DR restore UI (standalone, no sidebar)

Files (MODIFIED):

internal/web/server.go — restoreMode + restorePlan state; SetRestoreState(); route interception (redirect all to /restore)
cmd/controller/main.go — after Phase 2 drive mount, scan for backups + build restore plan + pass to web server

Restore page behavior:

When restoreMode is active, ALL web routes redirect to /restore (except /static/*, /api/health, /api/restore/*, /login, /logout)
Page shows: domain/customer info, drive status, per-app table (config found, data found, DB dump found), restore all / skip buttons
POST /api/restore/all starts sequential restore of all apps
POST /api/restore/skip exits restore mode → normal dashboard
GET /api/restore/status returns current plan with per-app status for JS polling
All text in Hungarian

Per-app restore sequence:

Restore stack config from _config/ → /opt/docker/stacks/<app>/
Verify app data exists on HDD (it should if HDD survived)
If app data missing but rsync backup exists → rsync data back
If DB dumps in _db/ → copy to primary dump dir
docker compose pull (pull images)
docker compose up -d (start app)
Update status → next app

Post-restore: re-scan stacks, clear restoreMode, normal dashboard operation

Phase 3 — What was deployed

Controller changes (controller/):

internal/backup/restore_scan.go — NEW — RestorePlan, RestorableApp, DriveInfo, InfraStackInfo types; ScanDrivesForBackups() scans mount paths for cross-drive backup dirs, correlates with Hub manifest; Snapshot() for thread-safe JSON serialization; UpdateApp() for progress tracking
internal/backup/restore_app_linux.go — NEW — RestoreAppFromBackup() restores a single app: rsyncs _config/ to stack dir, verifies/restores user data, copies DB dumps, runs docker compose pull && up -d
internal/backup/restore_app_other.go — NEW — non-Linux stub
internal/web/handler_restore.go — NEW — restorePageHandler() renders DR page; apiRestoreStatus() returns plan+app statuses as JSON; apiRestoreAll() triggers sequential restore in goroutine; apiRestoreSkip() exits restore mode; executeAllRestores() drives the restore loop with per-app timeout
internal/web/templates/restore.html — NEW — standalone full-page DR UI (no sidebar); shows customer info, drive status cards, app table with config/data/DB columns, progress bar, restore all / skip buttons; JS polling every 2s during restore
internal/web/server.go — added restorePlan *backup.RestorePlan + restoreMu; SetRestoreState() and InRestoreMode() methods; route interception in ServeHTTP() redirects all non-static/non-restore routes to /restore when in restore mode
internal/web/funcmap.go — added statusText template function (Hungarian labels for restore status codes)
cmd/controller/main.go — after Phase 2 drive mount, builds []InfraStackInfo from Hub data, calls ScanDrivesForBackups(), sets restorePlan metadata, calls webServer.SetRestoreState()

Phase 4 — What was deployed

Script changes:

scripts/docker-setup.sh — print_summary() now shows a "Disaster Recovery" block when $CUSTOMER_ID is set, informing the operator that the controller will automatically contact the Hub, mount drives, and offer restore

README updates:

controller/README.md — version bump to v0.15.5; repo layout updated with new DR files (restore_scan.go, restore_app_linux.go, restore_drives_linux.go, infra_pull.go, handler_restore.go); roadmap marks DR as completed
Hub README (felhom.eu/hub/README.md) — already had complete DR documentation, no changes needed

Architecture

The problem (catch-22)

When the system drive dies, the backup data lives on surviving HDDs. But a freshly installed OS doesn't know about those drives — they aren't in /etc/fstab, aren't mounted, and the controller can't scan them. Even if we stored mount info in the local backup, we can't read the local backup without mounting the drives first.

The solution: Hub as infra backup store

The Hub (hub.felhom.eu) is always reachable. During normal operation, the controller pushes its infrastructure state to the Hub. On a fresh deployment:

[1] docker-setup.sh deploys controller with Hub details (customer_id + API key)
[2] Controller starts → detects empty data dir → "I'm a fresh deployment"
[3] Controller calls Hub: GET /api/v1/infra-backup/{customer_id}
[4] Hub responds with: disk layout, controller.yaml, manifest, restic passwords
[5] Controller scans /dev/ for disks matching stored UUIDs
[6] Controller mounts surviving drives (using its existing disk management)
[7] Local backups on mounted drives are now accessible
[8] Controller auto-restores stack configs → apps appear in dashboard
[9] User opens dashboard → "Restore from backup" wizard
[10] User confirms → controller restores data + starts apps

Fallback: local-only detection

If the Hub is unreachable (no internet, Hub down), the controller falls back to scanning already-mounted drives for _infra/manifest.json — the existing local backup path. This is less automated (drives must be manually mounted first) but still works.

Data stored on Hub per customer

The infra-backup payload is a single JSON blob (~20-50KB per customer):

{
  "customer_id": "demo-felhom",
  "domain": "demo-felhom.eu",
  "controller_version": "v0.15.5",
  "timestamp": "2026-02-19T03:05:00Z",

  "controller_config_b64": "<base64-encoded controller.yaml>",
  "settings_json_b64": "<base64-encoded settings.json>",

  "disk_layout": {
    "mounts": [
      {
        "uuid": "242ee4da-d9f8-40ce-b3fa-8e4860204790",
        "label": "userdate",
        "mount_point": "/mnt/sys_drive",
        "fs_type": "ext4",
        "size_bytes": 350073856000,
        "fstab_options": "defaults,noatime",
        "role": "system_data",
        "bind_subdir": "",
        "raw_mount": ""
      },
      {
        "uuid": "277a2179-a764-4758-b840-9ea741517914",
        "label": "hdd_1",
        "mount_point": "/mnt/hdd_1",
        "fs_type": "ext4",
        "size_bytes": 1000204886016,
        "fstab_options": "defaults,nofail,noatime",
        "role": "hdd_storage",
        "bind_subdir": "felhom_data",
        "raw_mount": "/mnt/.felhom-raw/hdd_1"
      }
    ]
  },

  "deployed_stacks": [
    {
      "name": "immich",
      "display_name": "Immich",
      "hdd_path": "/mnt/hdd_1",
      "needs_hdd": true
    },
    {
      "name": "docmost",
      "display_name": "Docmost",
      "hdd_path": "",
      "needs_hdd": false
    }
  ],

  "restic_password": "base64-encoded-primary-restic-password",
  "cross_drive_password": "hex-encoded-cross-drive-password"
}

Security: The Hub is operator-managed infrastructure. The connection is HTTPS with Bearer token auth. The infra backup contains sensitive data (CF tokens, restic passwords) but the Hub already receives all system health data. The operator trusts the Hub with this data.

Phase 1: Hub infra-backup storage + controller push

1A: Hub — new SQLite table

File: hub/internal/store/store.go

Add migration for a new table:

CREATE TABLE IF NOT EXISTS infra_backups (
    customer_id TEXT PRIMARY KEY,
    backup_json TEXT NOT NULL,
    updated_at  DATETIME NOT NULL DEFAULT (datetime('now'))
);

Add store methods:

// SaveInfraBackup upserts the infra backup for a customer.
func (s *Store) SaveInfraBackup(customerID string, backupJSON []byte) error {
    _, err := s.db.Exec(`
        INSERT INTO infra_backups (customer_id, backup_json, updated_at)
        VALUES (?, ?, datetime('now'))
        ON CONFLICT(customer_id) DO UPDATE SET
            backup_json = excluded.backup_json,
            updated_at = datetime('now')
    `, customerID, string(backupJSON))
    return err
}

// GetInfraBackup returns the infra backup for a customer, or nil if not found.
func (s *Store) GetInfraBackup(customerID string) ([]byte, error) {
    var data string
    err := s.db.QueryRow(`
        SELECT backup_json FROM infra_backups WHERE customer_id = ?
    `, customerID).Scan(&data)
    if err == sql.ErrNoRows {
        return nil, nil
    }
    if err != nil {
        return nil, err
    }
    return []byte(data), nil
}

1B: Hub — new API endpoints

File: hub/internal/api/handler.go

Add two endpoints to the existing router:

// POST /api/v1/infra-backup
// Controller pushes its infrastructure snapshot to the Hub.
func (h *Handler) handleInfraBackupPush(w http.ResponseWriter, r *http.Request) {
    // Read body (limit to 1MB)
    body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20))
    if err != nil {
        writeJSON(w, http.StatusBadRequest, map[string]string{"status": "error", "error": "read body: " + err.Error()})
        return
    }

    // Validate JSON structure — extract customer_id
    var payload struct {
        CustomerID string `json:"customer_id"`
    }
    if err := json.Unmarshal(body, &payload); err != nil || payload.CustomerID == "" {
        writeJSON(w, http.StatusBadRequest, map[string]string{"status": "error", "error": "invalid payload or missing customer_id"})
        return
    }

    if err := h.store.SaveInfraBackup(payload.CustomerID, body); err != nil {
        writeJSON(w, http.StatusInternalServerError, map[string]string{"status": "error", "error": err.Error()})
        return
    }

    h.logger.Printf("[INFO] Infra backup saved for %s (%d bytes)", payload.CustomerID, len(body))
    writeJSON(w, http.StatusOK, map[string]string{"status": "ok"})
}

// GET /api/v1/infra-backup/{customer_id}
// Fresh controller pulls the infra backup for its customer.
func (h *Handler) handleInfraBackupGet(w http.ResponseWriter, r *http.Request) {
    customerID := strings.TrimPrefix(r.URL.Path, "/api/v1/infra-backup/")
    if customerID == "" {
        writeJSON(w, http.StatusBadRequest, map[string]string{"status": "error", "error": "missing customer_id"})
        return
    }

    data, err := h.store.GetInfraBackup(customerID)
    if err != nil {
        writeJSON(w, http.StatusInternalServerError, map[string]string{"status": "error", "error": err.Error()})
        return
    }
    if data == nil {
        writeJSON(w, http.StatusNotFound, map[string]string{"status": "error", "error": "no infra backup found"})
        return
    }

    w.Header().Set("Content-Type", "application/json")
    w.Write(data)
}

case r.Method == http.MethodPost && path == "/api/v1/infra-backup":
    h.handleInfraBackupPush(w, r)
case r.Method == http.MethodGet && strings.HasPrefix(path, "/api/v1/infra-backup/"):
    h.handleInfraBackupGet(w, r)

Both endpoints use the existing Bearer token auth (same report_api_key).

1C: Hub — add infra backup info to dashboard

File: hub/internal/web/templates/customer.html

Add a section to the customer detail page showing infra backup status:

<!-- Infra Backup Status -->
<div class="card">
    <h3>Infra Backup</h3>
    {{if .InfraBackup}}
    <p>Last updated: {{.InfraBackupAge}} ago</p>
    <p>Deployed stacks: {{.InfraBackupStackCount}}</p>
    <p>Disks: {{.InfraBackupDiskCount}}</p>
    {{else}}
    <p style="color: var(--warning)">No infra backup received yet</p>
    {{end}}
</div>

Add store method and web handler logic to load infra backup metadata for the customer detail page.

1D: Controller — push infra snapshot to Hub

File: controller/internal/report/infra_backup.go (NEW)

package report

import (
    "encoding/base64"
    "encoding/json"
    "os"
    "time"

    "gitea.dooplex.hu/admin/felhom-controller/internal/backup"
    "gitea.dooplex.hu/admin/felhom-controller/internal/settings"
)

// InfraBackup is the payload pushed to the Hub for disaster recovery.
type InfraBackup struct {
    CustomerID        string          `json:"customer_id"`
    Domain            string          `json:"domain"`
    ControllerVersion string          `json:"controller_version"`
    Timestamp         string          `json:"timestamp"`

    ControllerConfigB64 string        `json:"controller_config_b64"`
    SettingsJSONB64     string        `json:"settings_json_b64,omitempty"`

    DiskLayout      DiskLayout        `json:"disk_layout"`
    DeployedStacks  []InfraStack      `json:"deployed_stacks"`

    ResticPassword      string        `json:"restic_password,omitempty"`
    CrossDrivePassword  string        `json:"cross_drive_password,omitempty"`
}

type DiskLayout struct {
    Mounts []DiskMount `json:"mounts"`
}

type DiskMount struct {
    UUID         string `json:"uuid"`
    Label        string `json:"label"`
    MountPoint   string `json:"mount_point"`
    FSType       string `json:"fs_type"`
    SizeBytes    int64  `json:"size_bytes"`
    FstabOptions string `json:"fstab_options"`
    Role         string `json:"role"`          // "system_data", "hdd_storage", "root"
    BindSubdir   string `json:"bind_subdir"`   // e.g., "felhom_data" for HDD bind mounts
    RawMount     string `json:"raw_mount"`     // e.g., "/mnt/.felhom-raw/hdd_1"
}

type InfraStack struct {
    Name        string `json:"name"`
    DisplayName string `json:"display_name"`
    HDDPath     string `json:"hdd_path,omitempty"`
    NeedsHDD    bool   `json:"needs_hdd"`
}

// BuildInfraBackup collects all infrastructure state for Hub backup.
func BuildInfraBackup(
    customerID, domain, version string,
    controllerYAMLPath string,
    settingsPath string,
    resticPasswordFile string,
    sett *settings.Settings,
    stackProvider backup.StackDataProvider,
) (*InfraBackup, error) {
    ib := &InfraBackup{
        CustomerID:        customerID,
        Domain:            domain,
        ControllerVersion: version,
        Timestamp:         time.Now().UTC().Format(time.RFC3339),
    }

    // Read and encode controller.yaml
    if data, err := os.ReadFile(controllerYAMLPath); err == nil {
        ib.ControllerConfigB64 = base64.StdEncoding.EncodeToString(data)
    }

    // Read and encode settings.json
    if data, err := os.ReadFile(settingsPath); err == nil {
        ib.SettingsJSONB64 = base64.StdEncoding.EncodeToString(data)
    }

    // Read restic password
    if data, err := os.ReadFile(resticPasswordFile); err == nil {
        ib.ResticPassword = base64.StdEncoding.EncodeToString(data)
    }

    // Read cross-drive password
    if pw := sett.GetCrossDriveResticPassword(); pw != "" {
        ib.CrossDrivePassword = pw
    }

    // Collect disk layout (see implementation note below)
    ib.DiskLayout = collectDiskLayout()

    // Collect deployed stacks
    deployed := stackProvider.ListDeployedStacks()
    for _, s := range deployed {
        ib.DeployedStacks = append(ib.DeployedStacks, InfraStack{
            Name:        s.Name,
            DisplayName: s.DisplayName,
            HDDPath:     stackProvider.GetStackHDDPath(s.Name),
            NeedsHDD:    s.NeedsHDD,
        })
    }

    return ib, nil
}

// collectDiskLayout reads /etc/fstab and lsblk to build the disk layout.
// This runs inside the container which has /host-fstab mounted and access to
// /host-dev/ for block device info.
func collectDiskLayout() DiskLayout {
    // Implementation: parse /host-fstab (mounted from host /etc/fstab)
    // and correlate with lsblk -J output.
    //
    // The controller already has disk management code in internal/stacks/
    // or similar — reuse the existing lsblk parsing.
    //
    // For each non-root, non-swap, non-boot mount in fstab:
    //   - Extract UUID, mount point, fs_type, options
    //   - Detect role: "system_data" if mount_point matches system_data_path,
    //     "hdd_storage" if it's under /mnt/.felhom-raw/ or /mnt/hdd_*
    //   - Detect bind mounts (type=none, options contain "bind")
    //   - Get size from lsblk
    //
    // Return the DiskLayout struct.
    //
    // See the detailed implementation note in the "Implementation details" section.
    return DiskLayout{}
}

1E: Controller — push infra backup after each backup cycle

File: controller/cmd/controller/main.go

Add the infra backup push to the backup scheduler (after Tier1 + Tier2 complete):

// In the "backup" daily scheduler:
sched.Daily("backup", cfg.Backup.ResticSchedule, func(ctx context.Context) error {
    err := backupMgr.RunBackup(ctx)
    crossDriveRunner.RunAllScheduled(ctx, "daily")
    if time.Now().Weekday() == time.Sunday {
        crossDriveRunner.RunAllScheduled(ctx, "weekly")
    }

    // NEW: Push infra backup to Hub
    if hubPusher != nil && cfg.Hub.Enabled {
        go pushInfraBackup(cfg, sett, stackProv, hubPusher, logger)
    }

    return err
})

func pushInfraBackup(cfg *config.Config, sett *settings.Settings,
    stackProv backup.StackDataProvider, pusher *report.Pusher, logger *log.Logger) {

    ib, err := report.BuildInfraBackup(
        cfg.Customer.ID, cfg.Customer.Domain, Version,
        "/opt/docker/felhom-controller/controller.yaml",
        filepath.Join(cfg.Paths.DataDir, "settings.json"),
        cfg.Backup.ResticPasswordFile,
        sett, stackProv,
    )
    if err != nil {
        logger.Printf("[WARN] Failed to build infra backup: %v", err)
        return
    }

    data, err := json.Marshal(ib)
    if err != nil {
        logger.Printf("[WARN] Failed to marshal infra backup: %v", err)
        return
    }

    if err := pusher.PushInfraBackup(data); err != nil {
        logger.Printf("[WARN] Failed to push infra backup to Hub: %v", err)
    } else {
        logger.Printf("[INFO] Infra backup pushed to Hub (%d bytes)", len(data))
    }
}

1F: Controller — add `PushInfraBackup` to Pusher

File: controller/internal/report/pusher.go

Add a new method alongside the existing Push():

// PushInfraBackup sends the infrastructure backup to the Hub.
func (p *Pusher) PushInfraBackup(data []byte) error {
    if !p.enabled {
        return nil
    }

    url := p.hubURL + "/api/v1/infra-backup"

    var lastErr error
    for attempt := 0; attempt < 3; attempt++ {
        if attempt > 0 {
            time.Sleep(5 * time.Second)
        }

        req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(data))
        if err != nil {
            lastErr = err
            continue
        }
        req.Header.Set("Content-Type", "application/json")
        if p.apiKey != "" {
            req.Header.Set("Authorization", "Bearer "+p.apiKey)
        }

        resp, err := p.httpClient.Do(req)
        if err != nil {
            lastErr = err
            continue
        }
        io.Copy(io.Discard, resp.Body)
        resp.Body.Close()

        if resp.StatusCode >= 200 && resp.StatusCode < 300 {
            return nil
        }
        lastErr = fmt.Errorf("HTTP %d", resp.StatusCode)
    }

    return fmt.Errorf("infra backup push failed after 3 attempts: %w", lastErr)
}

Phase 2: New-deployment detection + Hub pull + auto-mount

2A: Controller — detect fresh deployment

File: controller/cmd/controller/main.go

The controller uses a Docker named volume (controller-data) at /opt/docker/felhom-controller/data. On a fresh deployment, this volume is empty — no settings.json, no session_secret, no snapshot-history.json.

Add detection after settings initialization:

// Detect fresh deployment (empty data directory = new install)
isFreshDeployment := !fileExists(filepath.Join(cfg.Paths.DataDir, "settings.json"))

if isFreshDeployment {
    logger.Println("[INFO] Fresh deployment detected — checking Hub for infra backup")

    // Write a marker so we don't re-trigger on next restart
    // (settings.json will be created by Settings.save() soon anyway)
}

Important: The marker to distinguish "fresh" from "restarted" is the absence of settings.json. Once the Settings package creates it (on first save), subsequent restarts won't trigger the fresh-deployment path.

2B: Controller — pull infra backup from Hub

File: controller/internal/report/infra_pull.go (NEW)

package report

import (
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "time"
)

// PullInfraBackup fetches the infrastructure backup from the Hub.
// Returns nil, nil if no backup exists for this customer.
func PullInfraBackup(hubURL, apiKey, customerID string) (*InfraBackup, error) {
    url := hubURL + "/api/v1/infra-backup/" + customerID

    client := &http.Client{Timeout: 30 * time.Second}

    req, err := http.NewRequest(http.MethodGet, url, nil)
    if err != nil {
        return nil, err
    }
    if apiKey != "" {
        req.Header.Set("Authorization", "Bearer "+apiKey)
    }

    resp, err := client.Do(req)
    if err != nil {
        return nil, fmt.Errorf("hub request failed: %w", err)
    }
    defer resp.Body.Close()

    if resp.StatusCode == http.StatusNotFound {
        return nil, nil // no backup for this customer
    }
    if resp.StatusCode != http.StatusOK {
        return nil, fmt.Errorf("hub returned HTTP %d", resp.StatusCode)
    }

    body, err := io.ReadAll(io.LimitReader(resp.Body, 5<<20)) // 5MB limit
    if err != nil {
        return nil, fmt.Errorf("reading response: %w", err)
    }

    var ib InfraBackup
    if err := json.Unmarshal(body, &ib); err != nil {
        return nil, fmt.Errorf("parsing infra backup: %w", err)
    }

    return &ib, nil
}

2C: Controller — auto-mount drives from Hub disk layout

File: controller/internal/backup/restore_drives.go (NEW)

package backup

import (
    "context"
    "encoding/json"
    "fmt"
    "log"
    "os"
    "os/exec"
    "path/filepath"
    "strings"

    "gitea.dooplex.hu/admin/felhom-controller/internal/report"
)

// MountDrivesFromLayout scans block devices for disks matching the Hub's
// stored disk layout and mounts them. Uses the controller's existing
// two-layer mount pattern: raw mount → bind mount.
//
// The controller container has:
//   - /host-dev:/dev (rw) — block device access
//   - /host-fstab:/etc/fstab — can update fstab
//   - privileged: true — can mount filesystems
//
// Returns the list of successfully mounted paths.
func MountDrivesFromLayout(ctx context.Context, layout report.DiskLayout, logger *log.Logger) ([]string, error) {
    // 1. Get current block devices with UUIDs
    lsblkDevices, err := getLsblkDevices(ctx)
    if err != nil {
        return nil, fmt.Errorf("scanning block devices: %w", err)
    }

    var mounted []string

    for _, diskMount := range layout.Mounts {
        if diskMount.UUID == "" {
            continue
        }

        // Skip system partitions (root, boot, swap)
        if diskMount.Role == "root" || diskMount.Role == "boot" || diskMount.Role == "swap" {
            continue
        }

        // Find matching device by UUID
        device := findDeviceByUUID(lsblkDevices, diskMount.UUID)
        if device == "" {
            logger.Printf("[WARN] Disk UUID %s (%s) not found — drive may be missing",
                diskMount.UUID, diskMount.Label)
            continue
        }

        // Check if already mounted
        if isMounted(diskMount.MountPoint) || isMounted(diskMount.RawMount) {
            logger.Printf("[INFO] %s already mounted", diskMount.MountPoint)
            mounted = append(mounted, diskMount.MountPoint)
            continue
        }

        logger.Printf("[INFO] Found disk %s (UUID=%s, label=%s) — mounting to %s",
            device, diskMount.UUID[:12], diskMount.Label, diskMount.MountPoint)

        // Mount using the felhom two-layer pattern:
        // Layer 1: raw mount → /mnt/.felhom-raw/<label>
        // Layer 2: bind mount → <raw>/<subdir> to /mnt/<label>
        if diskMount.RawMount != "" && diskMount.BindSubdir != "" {
            // Two-layer HDD mount
            if err := mountRawAndBind(ctx, device, diskMount, logger); err != nil {
                logger.Printf("[ERROR] Failed to mount %s: %v", diskMount.Label, err)
                continue
            }
        } else {
            // Simple direct mount (e.g., sys_drive)
            if err := mountDirect(ctx, device, diskMount, logger); err != nil {
                logger.Printf("[ERROR] Failed to mount %s: %v", diskMount.Label, err)
                continue
            }
        }

        // Update host fstab so mount persists across reboots
        if err := addToFstab(diskMount, logger); err != nil {
            logger.Printf("[WARN] Failed to update fstab for %s: %v", diskMount.Label, err)
            // Non-fatal — mount works for now, fstab can be fixed later
        }

        mounted = append(mounted, diskMount.MountPoint)
        logger.Printf("[INFO] Mounted %s at %s", diskMount.Label, diskMount.MountPoint)
    }

    return mounted, nil
}

// getLsblkDevices runs lsblk -J and returns device → UUID mapping.
func getLsblkDevices(ctx context.Context) (map[string]string, error) {
    cmd := exec.CommandContext(ctx, "lsblk", "-J", "-o", "NAME,UUID,LABEL,FSTYPE,SIZE,MOUNTPOINT")
    out, err := cmd.Output()
    if err != nil {
        return nil, err
    }

    var result struct {
        BlockDevices []struct {
            Name     string `json:"name"`
            UUID     string `json:"uuid"`
            Label    string `json:"label"`
            FSType   string `json:"fstype"`
            Size     string `json:"size"`
            Mount    string `json:"mountpoint"`
            Children []struct {
                Name  string `json:"name"`
                UUID  string `json:"uuid"`
                Label string `json:"label"`
            } `json:"children"`
        } `json:"blockdevices"`
    }
    if err := json.Unmarshal(out, &result); err != nil {
        return nil, err
    }

    devices := make(map[string]string) // UUID → /dev/path
    for _, dev := range result.BlockDevices {
        if dev.UUID != "" {
            devices[dev.UUID] = "/dev/" + dev.Name
        }
        for _, child := range dev.Children {
            if child.UUID != "" {
                devices[child.UUID] = "/dev/" + child.Name
            }
        }
    }
    return devices, nil
}

func findDeviceByUUID(devices map[string]string, uuid string) string {
    return devices[uuid]
}

func isMounted(path string) bool {
    if path == "" {
        return false
    }
    _, err := os.Stat(path)
    if err != nil {
        return false
    }
    // Check /proc/mounts for the path
    data, err := os.ReadFile("/proc/mounts")
    if err != nil {
        return false
    }
    return strings.Contains(string(data), " "+path+" ")
}

func mountDirect(ctx context.Context, device string, dm report.DiskMount, logger *log.Logger) error {
    if err := os.MkdirAll(dm.MountPoint, 0755); err != nil {
        return err
    }
    cmd := exec.CommandContext(ctx, "mount", "-t", dm.FSType, device, dm.MountPoint)
    if out, err := cmd.CombinedOutput(); err != nil {
        return fmt.Errorf("%s: %w", strings.TrimSpace(string(out)), err)
    }
    return nil
}

func mountRawAndBind(ctx context.Context, device string, dm report.DiskMount, logger *log.Logger) error {
    // Layer 1: raw mount
    if err := os.MkdirAll(dm.RawMount, 0755); err != nil {
        return err
    }
    cmd := exec.CommandContext(ctx, "mount", "-t", dm.FSType, "-o", "noatime", device, dm.RawMount)
    if out, err := cmd.CombinedOutput(); err != nil {
        return fmt.Errorf("raw mount: %s: %w", strings.TrimSpace(string(out)), err)
    }

    // Layer 2: bind mount (subdir → final mount point)
    bindSrc := filepath.Join(dm.RawMount, dm.BindSubdir)
    if err := os.MkdirAll(bindSrc, 0755); err != nil {
        return err
    }
    if err := os.MkdirAll(dm.MountPoint, 0755); err != nil {
        return err
    }
    cmd = exec.CommandContext(ctx, "mount", "--bind", bindSrc, dm.MountPoint)
    if out, err := cmd.CombinedOutput(); err != nil {
        return fmt.Errorf("bind mount: %s: %w", strings.TrimSpace(string(out)), err)
    }

    return nil
}

func addToFstab(dm report.DiskMount, logger *log.Logger) error {
    const fstabPath = "/host-fstab" // mounted from host /etc/fstab

    data, err := os.ReadFile(fstabPath)
    if err != nil {
        return err
    }

    content := string(data)

    // Check if UUID already in fstab
    if strings.Contains(content, dm.UUID) {
        logger.Printf("[INFO] UUID %s already in fstab", dm.UUID[:12])
        return nil
    }

    // Append entries
    var additions strings.Builder
    additions.WriteString("\n# Restored by felhom-controller DR\n")

    if dm.RawMount != "" {
        // Raw mount entry
        additions.WriteString(fmt.Sprintf("UUID=%s\t%s\t%s\t%s\t0 2\n",
            dm.UUID, dm.RawMount, dm.FSType, dm.FstabOptions))
    }

    if dm.BindSubdir != "" && dm.RawMount != "" {
        // Bind mount entry
        additions.WriteString(fmt.Sprintf("%s/%s\t%s\tnone\tbind,nofail\t0 0\n",
            dm.RawMount, dm.BindSubdir, dm.MountPoint))
    } else if dm.RawMount == "" {
        // Direct mount entry
        additions.WriteString(fmt.Sprintf("UUID=%s\t%s\t%s\t%s\t0 2\n",
            dm.UUID, dm.MountPoint, dm.FSType, dm.FstabOptions))
    }

    // Atomic write
    tmpPath := fstabPath + ".tmp"
    if err := os.WriteFile(tmpPath, []byte(content+additions.String()), 0644); err != nil {
        return err
    }
    return os.Rename(tmpPath, fstabPath)
}

Important implementation note: The controller runs inside a Docker container with privileged: true. The mount operations happen on the host's mount namespace because the container has propagation: rshared on the /mnt volume. The lsblk command will see host block devices via /host-dev. Study the existing disk management code in the controller before implementing — there may be helpers for lsblk parsing and mount operations already.

2D: Controller — orchestrate the fresh-deployment flow

File: controller/cmd/controller/main.go

In the startup sequence, after isFreshDeployment detection:

if isFreshDeployment {
    logger.Println("[INFO] Fresh deployment detected — checking Hub for infra backup")

    var infraBackup *report.InfraBackup
    var restoreSource string

    // Try Hub first (primary path)
    if cfg.Hub.Enabled && cfg.Hub.URL != "" {
        ib, err := report.PullInfraBackup(cfg.Hub.URL, cfg.Hub.APIKey, cfg.Customer.ID)
        if err != nil {
            logger.Printf("[WARN] Could not reach Hub: %v", err)
        } else if ib != nil {
            infraBackup = ib
            restoreSource = "hub"
            logger.Printf("[INFO] Found infra backup on Hub: %s (%s), %d stacks, synced %s",
                ib.Domain, ib.CustomerID, len(ib.DeployedStacks), ib.Timestamp)
        } else {
            logger.Println("[INFO] No infra backup found on Hub for this customer")
        }
    }

    if infraBackup != nil {
        // Restore restic passwords from Hub backup
        restorePasswordsFromHub(infraBackup, cfg, sett, logger)

        // Restore settings.json from Hub backup
        restoreSettingsFromHub(infraBackup, cfg, logger)

        // Mount drives using stored disk layout
        ctx := context.Background()
        mountedPaths, err := backup.MountDrivesFromLayout(ctx, infraBackup.DiskLayout, logger)
        if err != nil {
            logger.Printf("[WARN] Drive mounting error: %v", err)
        } else {
            logger.Printf("[INFO] Mounted %d drives from Hub disk layout", len(mountedPaths))
        }

        // Now scan mounted drives for local backup data
        mountPoints := discoverMountPoints() // re-scan after mounting
        restoreDrives = backup.DetectBackupsOnDrives(mountPoints, logger)

        // Auto-restore stack configs
        if len(restoreDrives) > 0 {
            restored, err := backup.RestoreStackConfigs(restoreDrives, cfg.Paths.StacksDir, logger)
            if err != nil {
                logger.Printf("[WARN] Stack config restore: %v", err)
            } else {
                logger.Printf("[INFO] Restored %d stack configs from local backup", restored)
            }
        } else if infraBackup != nil {
            // Fallback: restore stack configs from Hub data
            // (Hub has the deployed_stacks list but not full compose files)
            logger.Println("[WARN] No local backups found — stack configs must be synced from git catalog")
        }

        // Re-scan stacks
        stackMgr.ScanStacks()

        // Build restore plan (uses local backup data for rsync/restic info)
        restorePlan = backup.BuildRestorePlan(restoreDrives, logger)
        restoreMode = true

    } else {
        // Fallback: try local-only detection (drives might be pre-mounted)
        mountPoints := discoverMountPoints()
        restoreDrives = backup.DetectBackupsOnDrives(mountPoints, logger)
        if len(restoreDrives) > 0 {
            // Same local-only flow as before
            // ...
        }
    }
}

Helper functions:

func restorePasswordsFromHub(ib *report.InfraBackup, cfg *config.Config,
    sett *settings.Settings, logger *log.Logger) {

    if ib.ResticPassword != "" {
        if decoded, err := base64.StdEncoding.DecodeString(ib.ResticPassword); err == nil {
            dir := filepath.Dir(cfg.Backup.ResticPasswordFile)
            os.MkdirAll(dir, 0700)
            if err := os.WriteFile(cfg.Backup.ResticPasswordFile, decoded, 0600); err == nil {
                logger.Println("[INFO] Primary restic password restored from Hub")
            }
        }
    }

    if ib.CrossDrivePassword != "" {
        if err := sett.SetCrossDriveResticPassword(ib.CrossDrivePassword); err == nil {
            logger.Println("[INFO] Cross-drive restic password restored from Hub")
        }
    }
}

func restoreSettingsFromHub(ib *report.InfraBackup, cfg *config.Config, logger *log.Logger) {
    if ib.SettingsJSONB64 == "" {
        return
    }
    decoded, err := base64.StdEncoding.DecodeString(ib.SettingsJSONB64)
    if err != nil {
        return
    }
    settingsPath := filepath.Join(cfg.Paths.DataDir, "settings.json")
    if err := os.WriteFile(settingsPath, decoded, 0600); err == nil {
        logger.Println("[INFO] Settings restored from Hub backup")
    }
}

Phase 3: Restore UI + app data restoration

3A: Restore page + API handlers

Same as the previous TASK2 design — the restore UI, API endpoints, and sequential restoration logic. Now that drives are mounted by Phase 2, the local backup data is accessible.

Files (NEW):

controller/internal/web/handler_restore.go — page handler + API
controller/internal/web/templates/restore.html — restore wizard UI
controller/internal/backup/restore_rsync.go — restore from rsync backups

Files (MODIFIED):

controller/internal/web/server.go — route registration + restore state
controller/internal/web/templates/dashboard.html — restore banner
controller/internal/web/templates/layout.html — sidebar restore link

The implementation is the same as described in the sections below. Refer to the "Phase 3 detail" section at the end of this document.

3B: Restore-from-rsync function

Same as previously designed. The rsync backups are plain files — no password needed. The function rsyncs _config/, _db/, and user data directories back to their original locations.

Strategy: rsync first, restic fallback (sequential per app).

3C: Restore flow integration

After the user clicks "Restore All" on the restore page:

For each app in the restore plan (sequentially): a. Check for rsync backup → use RestoreFromRsync() if available b. Else check for restic backup → use existing RestoreApp() with latest snapshot c. If DB dump exists → restore to the app's dump directory d. Pull Docker images (docker compose pull) e. Start the app (docker compose up -d) f. Update status in UI (via polling API)
When all done, clear restoreMode flag
Dashboard returns to normal

Phase 4: docker-setup.sh integration

4A: Minimal controller.yaml for fresh deployment

The setup script's wizard collects just enough for the controller to start and contact the Hub:

Required for Hub contact:

customer.id — identifies which backup to pull
customer.domain — for Traefik labels
hub.enabled: true
hub.api_key — hardcoded, same for everyone
hub.url — hardcoded

Everything else can be restored from the Hub backup (git credentials, monitoring UUIDs, CF tokens, etc.). The wizard should still ask for these as before (they might be a genuinely new customer with no Hub backup), but the restore flow overwrites them if a Hub backup is found.

4B: Post-deploy message

After deploying the controller, the script prints:

If this is a reinstallation, the controller will automatically:
  1. Contact the Hub for your previous configuration
  2. Mount your existing storage drives
  3. Detect and restore your applications

Open https://felhom.<DOMAIN> to monitor the restore process.

Phase 3 detail: Restore UI and data restoration

handler_restore.go

File: NEW controller/internal/web/handler_restore.go

package web

import (
    "context"
    "net/http"

    "gitea.dooplex.hu/admin/felhom-controller/internal/backup"
)

func (s *Server) restorePageHandler(w http.ResponseWriter, r *http.Request) {
    if !s.restoreMode {
        http.Redirect(w, r, "/", http.StatusFound)
        return
    }

    data := s.baseData("restore", "Visszaállítás")
    data["RestorePlan"] = s.restorePlan
    data["Drives"] = s.restoreDrives

    // Summary from first available drive manifest
    for _, d := range s.restoreDrives {
        if d.Manifest != nil {
            data["Domain"] = d.Manifest.Domain
            data["CustomerID"] = d.Manifest.CustomerID
            data["LastSync"] = d.Manifest.LastSync
            data["StackCount"] = len(d.Manifest.DeployedStacks)
            break
        }
    }
    s.render(w, "restore", data)
}

func (s *Server) apiRestoreApp(w http.ResponseWriter, r *http.Request, stackName string) {
    if !s.restoreMode {
        writeJSON(w, http.StatusBadRequest, apiResponse{OK: false, Error: "not in restore mode"})
        return
    }

    var app *backup.RestorableApp
    for i := range s.restorePlan {
        if s.restorePlan[i].Name == stackName {
            app = &s.restorePlan[i]
            break
        }
    }
    if app == nil {
        writeJSON(w, http.StatusNotFound, apiResponse{OK: false, Error: "app not in restore plan"})
        return
    }

    go s.executeAppRestore(app)
    writeJSON(w, http.StatusOK, apiResponse{OK: true, Message: "Visszaállítás elindítva"})
}

func (s *Server) apiRestoreAll(w http.ResponseWriter, r *http.Request) {
    if !s.restoreMode {
        writeJSON(w, http.StatusBadRequest, apiResponse{OK: false, Error: "not in restore mode"})
        return
    }
    go s.executeAllRestores()
    writeJSON(w, http.StatusOK, apiResponse{OK: true, Message: "Visszaállítás elindítva"})
}

func (s *Server) apiRestoreStatus(w http.ResponseWriter, r *http.Request) {
    writeJSON(w, http.StatusOK, apiResponse{OK: true, Data: s.restorePlan})
}

func (s *Server) executeAppRestore(app *backup.RestorableApp) {
    ctx := context.Background()
    app.RestoreStatus = "restoring"

    var restoreErr error

    if app.HasRsync {
        restoreErr = backup.RestoreFromRsync(ctx, *app, s.cfg.Paths.StacksDir, s.logger)
    } else if app.HasRestic {
        restoreErr = s.restoreFromResticBackup(ctx, app)
    } else {
        restoreErr = fmt.Errorf("no backup source available")
    }

    if restoreErr != nil {
        app.RestoreStatus = "failed"
        app.RestoreError = restoreErr.Error()
        s.logger.Printf("[ERROR] Restore failed for %s: %v", app.Name, restoreErr)
        return
    }

    // Pull images and start
    if err := s.stackMgr.PullAndStart(app.Name); err != nil {
        s.logger.Printf("[WARN] Could not start %s after restore: %v", app.Name, err)
    }

    app.RestoreStatus = "done"
    s.logger.Printf("[INFO] Restore completed for %s", app.Name)
}

func (s *Server) executeAllRestores() {
    for i := range s.restorePlan {
        app := &s.restorePlan[i]
        if app.RestoreStatus == "done" || app.RestoreStatus == "failed" {
            continue
        }
        s.executeAppRestore(app)
    }
    s.logger.Println("[INFO] All app restores completed")
}

restore.html template

File: NEW controller/internal/web/templates/restore.html

All text in Hungarian. The template renders:

Banner: "Korábbi telepítés észlelve" (Previous installation detected)
Summary: domain, customer, last sync timestamp, stack count
Table: app name, backup type (Rsync/Restic/None), DB dump (yes/no), status, action button
"Összes visszaállítása" (Restore all) button
"Kihagyás" (Skip) button → redirects to dashboard
JavaScript polling (3s interval) for status updates during restore
Auto-redirect to dashboard when all done

See the previous TASK2 version for the full template HTML — it remains the same.

restore_rsync.go

File: NEW controller/internal/backup/restore_rsync.go

Restores app data from cross-drive rsync backup:

_config/ → stack compose directory
_db/ → DB dump directory
User data directories → original mount paths

See the previous TASK2 version for the implementation — the function signature and logic remain the same.

Route registration

File: controller/internal/web/server.go

Add to ServeHTTP():

case path == "/restore":
    s.restorePageHandler(w, r)
case path == "/api/restore/all" && r.Method == http.MethodPost:
    s.apiRestoreAll(w, r)
case path == "/api/restore/status":
    s.apiRestoreStatus(w, r)
case strings.HasPrefix(path, "/api/restore/") && r.Method == http.MethodPost:
    stackName := strings.TrimPrefix(path, "/api/restore/")
    s.apiRestoreApp(w, r, stackName)

Dashboard banner + sidebar link

Same as previous TASK2 — add conditional restore banner to dashboard.html and restore nav link to layout.html.

Also: Continue backing up passwords to `_infra/` locally

The local _infra/ backup (on each drive) should ALSO include passwords, as a belt-and-suspenders approach. If the Hub is unreachable during DR, but drives happen to be pre-mounted (manual fstab or auto-detection), the local backup should be self-sufficient.

File: controller/internal/backup/crossdrive.go — modify syncInfraConfig()

After the existing controller.yaml copy (line 494), add:

// Copy primary restic password → _infra/restic-password
if data, err := os.ReadFile(r.primaryResticPasswordFile); err == nil {
    pwDest := filepath.Join(infraDir, "restic-password")
    os.WriteFile(pwDest, data, 0600)
}

// Copy cross-drive restic password → _infra/cross-drive-password
if cdPw := r.sett.GetCrossDriveResticPassword(); cdPw != "" {
    cdDest := filepath.Join(infraDir, "cross-drive-password")
    os.WriteFile(cdDest, []byte(cdPw), 0600)
}

// Write manifest.json
r.writeManifest(infraDir)

Add primaryResticPasswordFile field to CrossDriveRunner struct, pass from main.go, and add the writeManifest() helper (see Phase 1D for the manifest format — same InfraStack structure).

Summary of all files

Hub (`e:/git/felhom.eu/hub/`)

File	Change	Phase
`internal/store/store.go`	New table `infra_backups` + `SaveInfraBackup`, `GetInfraBackup`	1
`internal/api/handler.go`	New endpoints: POST + GET `/api/v1/infra-backup`	1
`internal/web/templates/customer.html`	Infra backup status section	1
`internal/web/server.go`	Pass infra backup data to customer template	1

Controller (`e:/git/deploy-felhom-compose/controller/`)

File	Change	Phase
`internal/report/infra_backup.go`	NEW — `InfraBackup` type, `BuildInfraBackup()`, `collectDiskLayout()`	1
`internal/report/infra_pull.go`	NEW — `PullInfraBackup()`	2
`internal/report/pusher.go`	Add `PushInfraBackup()` method	1
`internal/backup/restore_drives.go`	NEW — `MountDrivesFromLayout()`, lsblk parsing, fstab updates	2
`internal/backup/restore_infra.go`	NEW — `DetectBackupsOnDrives()`, `BuildRestorePlan()`, `RestoreStackConfigs()`, `RestoreResticPasswords()`	2
`internal/backup/restore_rsync.go`	NEW — `RestoreFromRsync()`	3
`internal/backup/crossdrive.go`	Add password backup + manifest to `syncInfraConfig()`	1
`internal/backup/paths.go`	New path helpers for `_infra/` files	1
`internal/settings/settings.go`	Add `GetCrossDriveResticPassword()`, `SetCrossDriveResticPassword()`	1
`cmd/controller/main.go`	Fresh-deployment detection, Hub pull, drive mount, restore orchestration	2
`internal/web/server.go`	Restore routes, `SetRestoreState()`	3
`internal/web/handler_restore.go`	NEW — restore page + API handlers	3
`internal/web/templates/restore.html`	NEW — restore wizard UI (Hungarian)	3
`internal/web/templates/dashboard.html`	Restore banner	3
`internal/web/templates/layout.html`	Sidebar restore link	3

Script

File	Change	Phase
`scripts/docker-setup.sh`	Hub-aware restore detection in wizard	4

Total: ~1400 lines across 17 files (7 new, 10 modified)

Build & deploy order

Phase 1 (Hub + controller push):

# 1. Build and deploy Hub
cd e:/git/felhom.eu/hub
# ... implement changes ...
make VERSION=0.2.0 docker docker-push
kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.2.0

# 2. Build and deploy controller
cd e:/git/deploy-felhom-compose/controller
# ... implement changes ...
# Build, push, deploy as usual (see MEMORY.md workflow)

Phase 2 (controller pull + auto-mount):

# Controller-only changes. Build and deploy.

Phase 3 (restore UI):

# Controller-only changes. Build and deploy.
# Test: stop controller, clear stacks dir, restart → should enter restore mode

Phase 4 (docker-setup.sh):

# Script changes only. Copy to demo node and test.

Testing

Phase 1 verification

# After deploying updated controller, trigger a backup:
curl -X POST https://felhom.demo-felhom.eu/api/backup/run

# Check Hub for the infra backup:
curl -H "Authorization: Bearer 094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8" \
    https://hub.felhom.eu/api/v1/infra-backup/demo-felhom | jq .

Phase 2-3 simulation (on demo node)

# WARNING: This simulates a DR scenario on the demo node.
# It temporarily clears the stacks dir to trigger restore mode.
SSH=/c/Windows/System32/OpenSSH/ssh.exe

# 1. Backup current state
$SSH kisfenyo@192.168.0.162 "sudo cp -r /opt/docker/stacks /tmp/stacks-backup"

# 2. Stop controller, clear stacks to simulate fresh install
$SSH kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker compose down"
$SSH kisfenyo@192.168.0.162 "sudo docker volume rm felhom-controller_controller-data"
$SSH kisfenyo@192.168.0.162 "sudo rm -rf /opt/docker/stacks/*"

# 3. Start controller — should detect fresh deployment + pull from Hub
$SSH kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker compose up -d"
$SSH kisfenyo@192.168.0.162 "sleep 15 && docker logs felhom-controller --tail 30 2>&1"

# 4. Open dashboard — should show restore wizard

# 5. After testing, restore original state if needed:
$SSH kisfenyo@192.168.0.162 "sudo cp -r /tmp/stacks-backup/* /opt/docker/stacks/"

52 KiB Raw Blame History

TASK2: Disaster Recovery — Hub-Based Infrastructure Restore

Overview

Phase 1 — What was deployed

Phase 2 — What was deployed

Phase 3 — Implementation plan

Phase 3 — What was deployed

Phase 4 — What was deployed

Architecture

The problem (catch-22)

The solution: Hub as infra backup store

Fallback: local-only detection

Data stored on Hub per customer

Phase 1: Hub infra-backup storage + controller push

1A: Hub — new SQLite table

1B: Hub — new API endpoints

1C: Hub — add infra backup info to dashboard

1D: Controller — push infra snapshot to Hub

1E: Controller — push infra backup after each backup cycle

1F: Controller — add PushInfraBackup to Pusher

Phase 2: New-deployment detection + Hub pull + auto-mount

2A: Controller — detect fresh deployment

2B: Controller — pull infra backup from Hub

2C: Controller — auto-mount drives from Hub disk layout

2D: Controller — orchestrate the fresh-deployment flow

Phase 3: Restore UI + app data restoration

3A: Restore page + API handlers

3B: Restore-from-rsync function

3C: Restore flow integration

Phase 4: docker-setup.sh integration

4A: Minimal controller.yaml for fresh deployment

4B: Post-deploy message

Phase 3 detail: Restore UI and data restoration

handler_restore.go

restore.html template

restore_rsync.go

Route registration

Dashboard banner + sidebar link

Also: Continue backing up passwords to _infra/ locally

Summary of all files

Hub (e:/git/felhom.eu/hub/)

Controller (e:/git/deploy-felhom-compose/controller/)

Script

Total: ~1400 lines across 17 files (7 new, 10 modified)

Build & deploy order

Testing

Phase 1 verification

Phase 2-3 simulation (on demo node)

52 KiB

Raw Blame History

1F: Controller — add `PushInfraBackup` to Pusher

Also: Continue backing up passwords to `_infra/` locally

Hub (`e:/git/felhom.eu/hub/`)

Controller (`e:/git/deploy-felhom-compose/controller/`)