6713df2186
Complete DR implementation (TASK2.md Phases 1-4): - Hub infra-backup push/pull endpoints (controller.yaml, disk layout, stacks) - Fresh-deployment detection pulls config from Hub, auto-mounts drives by UUID - Full-page restore UI with drive status, app table, sequential restore - docker-setup.sh shows DR instructions when customer_id is configured New files: disk_layout.go, restore_scan.go, restore_app_linux.go, restore_drives_linux.go, infra_backup.go, infra_pull.go, handler_restore.go, restore.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1443 lines
52 KiB
Markdown
1443 lines
52 KiB
Markdown
# TASK2: Disaster Recovery — Hub-Based Infrastructure Restore
|
|
|
|
## Overview
|
|
|
|
Add the ability to fully restore a Felhom deployment after a system drive failure.
|
|
The controller pushes an **infrastructure snapshot** to the central Hub during
|
|
each backup cycle. When a fresh controller is deployed on a replacement system,
|
|
it pulls the snapshot from the Hub, auto-mounts surviving drives using stored
|
|
disk UUIDs, and restores all applications and their data.
|
|
|
|
**This is a phased implementation:**
|
|
|
|
| Phase | Scope | Where | Status |
|
|
|-------|-------|-------|--------|
|
|
| **Phase 1** | Hub infra-backup endpoints + controller push | Hub + Controller | **DONE** |
|
|
| **Phase 2** | New-deployment detection + Hub pull + auto-mount | Controller | **DONE** |
|
|
| **Phase 3** | Restore UI + app data restoration | Controller | **DONE** |
|
|
| **Phase 4** | docker-setup.sh integration | Script | **DONE** |
|
|
|
|
Phases 1-2 can be deployed independently. Phase 3 depends on Phase 2.
|
|
Phase 4 depends on Phase 1 (needs Hub endpoints).
|
|
|
|
### Phase 1 — What was deployed
|
|
|
|
**Hub changes** (`e:/git/felhom.eu/hub/`):
|
|
- `internal/store/store.go` — new `infra_backups` table (CREATE TABLE in migrate()), `SaveInfraBackup()`, `GetInfraBackup()`, `GetInfraBackupMeta()` + `InfraBackupMeta` struct
|
|
- `internal/api/handler.go` — `POST /api/v1/infra-backup` (push) + `GET /api/v1/infra-backup/{customer_id}` (pull), both with Bearer auth
|
|
- `internal/web/server.go` — `handleCustomerDetail()` loads `InfraBackupMeta` and passes to template
|
|
- `internal/web/templates/customer.html` — "Infra Backup" card showing last-updated age, stack count, disk count
|
|
|
|
**Controller changes** (`controller/`):
|
|
- `internal/settings/settings.go` — new `GetCrossDriveResticPassword()` read-only getter
|
|
- `internal/report/infra_backup.go` — `InfraBackup`, `DiskLayout`, `DiskMount`, `InfraStack` types + `BuildInfraBackup()` builder
|
|
- `internal/report/infra_backup_linux.go` — `collectDiskLayout()` parses /host-fstab + blkid/lsblk for disk topology
|
|
- `internal/report/infra_backup_other.go` — no-op stub for non-Linux compilation
|
|
- `internal/report/pusher.go` — `PushInfraBackup()` method (3 retries, 5s backoff)
|
|
- `cmd/controller/main.go` — `pushInfraBackup()` helper; called after nightly backup cycle and on startup; `hubPusher` declaration moved earlier for closure access
|
|
|
|
### Phase 2 — What was deployed
|
|
|
|
**Controller changes** (`controller/`):
|
|
- `internal/backup/disk_layout.go` — **NEW** — `DiskLayout` and `DiskMount` types (moved from report to avoid circular import: report→backup, backup→report)
|
|
- `internal/report/infra_backup.go` — updated `DiskLayout` field to use `backup.DiskLayout`
|
|
- `internal/report/infra_backup_linux.go` — updated to return `backup.DiskLayout`
|
|
- `internal/report/infra_backup_other.go` — updated to return `backup.DiskLayout`
|
|
- `internal/report/infra_pull.go` — **NEW** — `PullInfraBackup(hubURL, apiKey, customerID)` HTTP GET from Hub, returns `*InfraBackup` or nil/nil for 404
|
|
- `internal/backup/restore_drives_linux.go` — **NEW** — `MountDrivesFromLayout(ctx, layout, logger)` scans block devices by UUID, mounts using two-layer pattern (raw+bind), updates /host-fstab; includes `scanBlockDeviceUUIDs()` (lsblk+blkid), `mountDirect()`, `mountRawAndBind()`, `addDRFstabEntries()`, `isMountedPath()`, `hostDevPath()`
|
|
- `internal/backup/restore_drives_other.go` — **NEW** — no-op stub for non-Linux compilation
|
|
- `internal/settings/settings.go` — added `SetCrossDriveResticPassword(password)` setter (RWMutex + atomic save)
|
|
- `cmd/controller/main.go` — added fresh-deployment detection (`!fileExists(settings.json)`), Hub pull, password restoration, settings restoration, drive mounting (with 2min timeout), settings re-load after restore; helper functions: `fileExists()`, `restorePasswordsFromHub()`, `restoreSettingsFromHub()`
|
|
|
|
### Phase 3 — Implementation plan
|
|
|
|
**Context:** After Phase 2, drives are mounted and local backup data is accessible.
|
|
The Hub infra backup has the `deployed_stacks` manifest and cross-drive backup data
|
|
lives at `<drive>/backups/secondary/<app>/rsync/` with `_config/` and `_db/` subdirs.
|
|
|
|
**Key insight:** In the common DR scenario (system drive died, HDDs survived), app data
|
|
is already on the HDD. The main thing to restore is stack configs (compose files +
|
|
app.yaml with deployed flag + env vars). Cross-drive rsync backups include `_config/`
|
|
which has the full stack directory.
|
|
|
|
**Files (NEW):**
|
|
- `internal/backup/restore_scan.go` — `RestorePlan`, `RestorableApp` types + `ScanDrivesForBackups()` + `BuildRestorePlan()`
|
|
- `internal/backup/restore_app_linux.go` — `RestoreAppFromBackup()` (restore config + data + DB dump + docker compose up)
|
|
- `internal/backup/restore_app_other.go` — non-Linux stub
|
|
- `internal/web/handler_restore.go` — restore page handler + JSON API endpoints
|
|
- `internal/web/templates/restore.html` — full-page DR restore UI (standalone, no sidebar)
|
|
|
|
**Files (MODIFIED):**
|
|
- `internal/web/server.go` — `restoreMode` + `restorePlan` state; `SetRestoreState()`; route interception (redirect all to /restore)
|
|
- `cmd/controller/main.go` — after Phase 2 drive mount, scan for backups + build restore plan + pass to web server
|
|
|
|
**Restore page behavior:**
|
|
- When `restoreMode` is active, ALL web routes redirect to `/restore` (except `/static/*`, `/api/health`, `/api/restore/*`, `/login`, `/logout`)
|
|
- Page shows: domain/customer info, drive status, per-app table (config found, data found, DB dump found), restore all / skip buttons
|
|
- POST `/api/restore/all` starts sequential restore of all apps
|
|
- POST `/api/restore/skip` exits restore mode → normal dashboard
|
|
- GET `/api/restore/status` returns current plan with per-app status for JS polling
|
|
- All text in Hungarian
|
|
|
|
**Per-app restore sequence:**
|
|
1. Restore stack config from `_config/` → `/opt/docker/stacks/<app>/`
|
|
2. Verify app data exists on HDD (it should if HDD survived)
|
|
3. If app data missing but rsync backup exists → rsync data back
|
|
4. If DB dumps in `_db/` → copy to primary dump dir
|
|
5. `docker compose pull` (pull images)
|
|
6. `docker compose up -d` (start app)
|
|
7. Update status → next app
|
|
|
|
**Post-restore:** re-scan stacks, clear restoreMode, normal dashboard operation
|
|
|
|
### Phase 3 — What was deployed
|
|
|
|
**Controller changes** (`controller/`):
|
|
- `internal/backup/restore_scan.go` — **NEW** — `RestorePlan`, `RestorableApp`, `DriveInfo`, `InfraStackInfo` types; `ScanDrivesForBackups()` scans mount paths for cross-drive backup dirs, correlates with Hub manifest; `Snapshot()` for thread-safe JSON serialization; `UpdateApp()` for progress tracking
|
|
- `internal/backup/restore_app_linux.go` — **NEW** — `RestoreAppFromBackup()` restores a single app: rsyncs `_config/` to stack dir, verifies/restores user data, copies DB dumps, runs `docker compose pull && up -d`
|
|
- `internal/backup/restore_app_other.go` — **NEW** — non-Linux stub
|
|
- `internal/web/handler_restore.go` — **NEW** — `restorePageHandler()` renders DR page; `apiRestoreStatus()` returns plan+app statuses as JSON; `apiRestoreAll()` triggers sequential restore in goroutine; `apiRestoreSkip()` exits restore mode; `executeAllRestores()` drives the restore loop with per-app timeout
|
|
- `internal/web/templates/restore.html` — **NEW** — standalone full-page DR UI (no sidebar); shows customer info, drive status cards, app table with config/data/DB columns, progress bar, restore all / skip buttons; JS polling every 2s during restore
|
|
- `internal/web/server.go` — added `restorePlan *backup.RestorePlan` + `restoreMu`; `SetRestoreState()` and `InRestoreMode()` methods; route interception in `ServeHTTP()` redirects all non-static/non-restore routes to `/restore` when in restore mode
|
|
- `internal/web/funcmap.go` — added `statusText` template function (Hungarian labels for restore status codes)
|
|
- `cmd/controller/main.go` — after Phase 2 drive mount, builds `[]InfraStackInfo` from Hub data, calls `ScanDrivesForBackups()`, sets `restorePlan` metadata, calls `webServer.SetRestoreState()`
|
|
|
|
### Phase 4 — What was deployed
|
|
|
|
**Script changes:**
|
|
- `scripts/docker-setup.sh` — `print_summary()` now shows a "Disaster Recovery" block when `$CUSTOMER_ID` is set, informing the operator that the controller will automatically contact the Hub, mount drives, and offer restore
|
|
|
|
**README updates:**
|
|
- `controller/README.md` — version bump to v0.15.5; repo layout updated with new DR files (restore_scan.go, restore_app_linux.go, restore_drives_linux.go, infra_pull.go, handler_restore.go); roadmap marks DR as completed
|
|
- Hub README (`felhom.eu/hub/README.md`) — already had complete DR documentation, no changes needed
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
### The problem (catch-22)
|
|
|
|
When the system drive dies, the backup data lives on surviving HDDs. But a freshly
|
|
installed OS doesn't know about those drives — they aren't in `/etc/fstab`, aren't
|
|
mounted, and the controller can't scan them. Even if we stored mount info in the
|
|
local backup, we can't read the local backup without mounting the drives first.
|
|
|
|
### The solution: Hub as infra backup store
|
|
|
|
The Hub (`hub.felhom.eu`) is always reachable. During normal operation, the
|
|
controller pushes its infrastructure state to the Hub. On a fresh deployment:
|
|
|
|
```
|
|
[1] docker-setup.sh deploys controller with Hub details (customer_id + API key)
|
|
[2] Controller starts → detects empty data dir → "I'm a fresh deployment"
|
|
[3] Controller calls Hub: GET /api/v1/infra-backup/{customer_id}
|
|
[4] Hub responds with: disk layout, controller.yaml, manifest, restic passwords
|
|
[5] Controller scans /dev/ for disks matching stored UUIDs
|
|
[6] Controller mounts surviving drives (using its existing disk management)
|
|
[7] Local backups on mounted drives are now accessible
|
|
[8] Controller auto-restores stack configs → apps appear in dashboard
|
|
[9] User opens dashboard → "Restore from backup" wizard
|
|
[10] User confirms → controller restores data + starts apps
|
|
```
|
|
|
|
### Fallback: local-only detection
|
|
|
|
If the Hub is unreachable (no internet, Hub down), the controller falls back to
|
|
scanning already-mounted drives for `_infra/manifest.json` — the existing local
|
|
backup path. This is less automated (drives must be manually mounted first) but
|
|
still works.
|
|
|
|
---
|
|
|
|
## Data stored on Hub per customer
|
|
|
|
The infra-backup payload is a single JSON blob (~20-50KB per customer):
|
|
|
|
```json
|
|
{
|
|
"customer_id": "demo-felhom",
|
|
"domain": "demo-felhom.eu",
|
|
"controller_version": "v0.15.5",
|
|
"timestamp": "2026-02-19T03:05:00Z",
|
|
|
|
"controller_config_b64": "<base64-encoded controller.yaml>",
|
|
"settings_json_b64": "<base64-encoded settings.json>",
|
|
|
|
"disk_layout": {
|
|
"mounts": [
|
|
{
|
|
"uuid": "242ee4da-d9f8-40ce-b3fa-8e4860204790",
|
|
"label": "userdate",
|
|
"mount_point": "/mnt/sys_drive",
|
|
"fs_type": "ext4",
|
|
"size_bytes": 350073856000,
|
|
"fstab_options": "defaults,noatime",
|
|
"role": "system_data",
|
|
"bind_subdir": "",
|
|
"raw_mount": ""
|
|
},
|
|
{
|
|
"uuid": "277a2179-a764-4758-b840-9ea741517914",
|
|
"label": "hdd_1",
|
|
"mount_point": "/mnt/hdd_1",
|
|
"fs_type": "ext4",
|
|
"size_bytes": 1000204886016,
|
|
"fstab_options": "defaults,nofail,noatime",
|
|
"role": "hdd_storage",
|
|
"bind_subdir": "felhom_data",
|
|
"raw_mount": "/mnt/.felhom-raw/hdd_1"
|
|
}
|
|
]
|
|
},
|
|
|
|
"deployed_stacks": [
|
|
{
|
|
"name": "immich",
|
|
"display_name": "Immich",
|
|
"hdd_path": "/mnt/hdd_1",
|
|
"needs_hdd": true
|
|
},
|
|
{
|
|
"name": "docmost",
|
|
"display_name": "Docmost",
|
|
"hdd_path": "",
|
|
"needs_hdd": false
|
|
}
|
|
],
|
|
|
|
"restic_password": "base64-encoded-primary-restic-password",
|
|
"cross_drive_password": "hex-encoded-cross-drive-password"
|
|
}
|
|
```
|
|
|
|
**Security:** The Hub is operator-managed infrastructure. The connection is HTTPS
|
|
with Bearer token auth. The infra backup contains sensitive data (CF tokens,
|
|
restic passwords) but the Hub already receives all system health data. The
|
|
operator trusts the Hub with this data.
|
|
|
|
---
|
|
|
|
## Phase 1: Hub infra-backup storage + controller push
|
|
|
|
### 1A: Hub — new SQLite table
|
|
|
|
**File:** `hub/internal/store/store.go`
|
|
|
|
Add migration for a new table:
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS infra_backups (
|
|
customer_id TEXT PRIMARY KEY,
|
|
backup_json TEXT NOT NULL,
|
|
updated_at DATETIME NOT NULL DEFAULT (datetime('now'))
|
|
);
|
|
```
|
|
|
|
Add store methods:
|
|
|
|
```go
|
|
// SaveInfraBackup upserts the infra backup for a customer.
|
|
func (s *Store) SaveInfraBackup(customerID string, backupJSON []byte) error {
|
|
_, err := s.db.Exec(`
|
|
INSERT INTO infra_backups (customer_id, backup_json, updated_at)
|
|
VALUES (?, ?, datetime('now'))
|
|
ON CONFLICT(customer_id) DO UPDATE SET
|
|
backup_json = excluded.backup_json,
|
|
updated_at = datetime('now')
|
|
`, customerID, string(backupJSON))
|
|
return err
|
|
}
|
|
|
|
// GetInfraBackup returns the infra backup for a customer, or nil if not found.
|
|
func (s *Store) GetInfraBackup(customerID string) ([]byte, error) {
|
|
var data string
|
|
err := s.db.QueryRow(`
|
|
SELECT backup_json FROM infra_backups WHERE customer_id = ?
|
|
`, customerID).Scan(&data)
|
|
if err == sql.ErrNoRows {
|
|
return nil, nil
|
|
}
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
return []byte(data), nil
|
|
}
|
|
```
|
|
|
|
### 1B: Hub — new API endpoints
|
|
|
|
**File:** `hub/internal/api/handler.go`
|
|
|
|
Add two endpoints to the existing router:
|
|
|
|
```go
|
|
// POST /api/v1/infra-backup
|
|
// Controller pushes its infrastructure snapshot to the Hub.
|
|
func (h *Handler) handleInfraBackupPush(w http.ResponseWriter, r *http.Request) {
|
|
// Read body (limit to 1MB)
|
|
body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20))
|
|
if err != nil {
|
|
writeJSON(w, http.StatusBadRequest, map[string]string{"status": "error", "error": "read body: " + err.Error()})
|
|
return
|
|
}
|
|
|
|
// Validate JSON structure — extract customer_id
|
|
var payload struct {
|
|
CustomerID string `json:"customer_id"`
|
|
}
|
|
if err := json.Unmarshal(body, &payload); err != nil || payload.CustomerID == "" {
|
|
writeJSON(w, http.StatusBadRequest, map[string]string{"status": "error", "error": "invalid payload or missing customer_id"})
|
|
return
|
|
}
|
|
|
|
if err := h.store.SaveInfraBackup(payload.CustomerID, body); err != nil {
|
|
writeJSON(w, http.StatusInternalServerError, map[string]string{"status": "error", "error": err.Error()})
|
|
return
|
|
}
|
|
|
|
h.logger.Printf("[INFO] Infra backup saved for %s (%d bytes)", payload.CustomerID, len(body))
|
|
writeJSON(w, http.StatusOK, map[string]string{"status": "ok"})
|
|
}
|
|
|
|
// GET /api/v1/infra-backup/{customer_id}
|
|
// Fresh controller pulls the infra backup for its customer.
|
|
func (h *Handler) handleInfraBackupGet(w http.ResponseWriter, r *http.Request) {
|
|
customerID := strings.TrimPrefix(r.URL.Path, "/api/v1/infra-backup/")
|
|
if customerID == "" {
|
|
writeJSON(w, http.StatusBadRequest, map[string]string{"status": "error", "error": "missing customer_id"})
|
|
return
|
|
}
|
|
|
|
data, err := h.store.GetInfraBackup(customerID)
|
|
if err != nil {
|
|
writeJSON(w, http.StatusInternalServerError, map[string]string{"status": "error", "error": err.Error()})
|
|
return
|
|
}
|
|
if data == nil {
|
|
writeJSON(w, http.StatusNotFound, map[string]string{"status": "error", "error": "no infra backup found"})
|
|
return
|
|
}
|
|
|
|
w.Header().Set("Content-Type", "application/json")
|
|
w.Write(data)
|
|
}
|
|
```
|
|
|
|
Register routes in the existing `ServeHTTP()` or router setup:
|
|
|
|
```go
|
|
case r.Method == http.MethodPost && path == "/api/v1/infra-backup":
|
|
h.handleInfraBackupPush(w, r)
|
|
case r.Method == http.MethodGet && strings.HasPrefix(path, "/api/v1/infra-backup/"):
|
|
h.handleInfraBackupGet(w, r)
|
|
```
|
|
|
|
Both endpoints use the existing Bearer token auth (same `report_api_key`).
|
|
|
|
### 1C: Hub — add infra backup info to dashboard
|
|
|
|
**File:** `hub/internal/web/templates/customer.html`
|
|
|
|
Add a section to the customer detail page showing infra backup status:
|
|
|
|
```html
|
|
<!-- Infra Backup Status -->
|
|
<div class="card">
|
|
<h3>Infra Backup</h3>
|
|
{{if .InfraBackup}}
|
|
<p>Last updated: {{.InfraBackupAge}} ago</p>
|
|
<p>Deployed stacks: {{.InfraBackupStackCount}}</p>
|
|
<p>Disks: {{.InfraBackupDiskCount}}</p>
|
|
{{else}}
|
|
<p style="color: var(--warning)">No infra backup received yet</p>
|
|
{{end}}
|
|
</div>
|
|
```
|
|
|
|
Add store method and web handler logic to load infra backup metadata for the
|
|
customer detail page.
|
|
|
|
### 1D: Controller — push infra snapshot to Hub
|
|
|
|
**File:** `controller/internal/report/infra_backup.go` (NEW)
|
|
|
|
```go
|
|
package report
|
|
|
|
import (
|
|
"encoding/base64"
|
|
"encoding/json"
|
|
"os"
|
|
"time"
|
|
|
|
"gitea.dooplex.hu/admin/felhom-controller/internal/backup"
|
|
"gitea.dooplex.hu/admin/felhom-controller/internal/settings"
|
|
)
|
|
|
|
// InfraBackup is the payload pushed to the Hub for disaster recovery.
|
|
type InfraBackup struct {
|
|
CustomerID string `json:"customer_id"`
|
|
Domain string `json:"domain"`
|
|
ControllerVersion string `json:"controller_version"`
|
|
Timestamp string `json:"timestamp"`
|
|
|
|
ControllerConfigB64 string `json:"controller_config_b64"`
|
|
SettingsJSONB64 string `json:"settings_json_b64,omitempty"`
|
|
|
|
DiskLayout DiskLayout `json:"disk_layout"`
|
|
DeployedStacks []InfraStack `json:"deployed_stacks"`
|
|
|
|
ResticPassword string `json:"restic_password,omitempty"`
|
|
CrossDrivePassword string `json:"cross_drive_password,omitempty"`
|
|
}
|
|
|
|
type DiskLayout struct {
|
|
Mounts []DiskMount `json:"mounts"`
|
|
}
|
|
|
|
type DiskMount struct {
|
|
UUID string `json:"uuid"`
|
|
Label string `json:"label"`
|
|
MountPoint string `json:"mount_point"`
|
|
FSType string `json:"fs_type"`
|
|
SizeBytes int64 `json:"size_bytes"`
|
|
FstabOptions string `json:"fstab_options"`
|
|
Role string `json:"role"` // "system_data", "hdd_storage", "root"
|
|
BindSubdir string `json:"bind_subdir"` // e.g., "felhom_data" for HDD bind mounts
|
|
RawMount string `json:"raw_mount"` // e.g., "/mnt/.felhom-raw/hdd_1"
|
|
}
|
|
|
|
type InfraStack struct {
|
|
Name string `json:"name"`
|
|
DisplayName string `json:"display_name"`
|
|
HDDPath string `json:"hdd_path,omitempty"`
|
|
NeedsHDD bool `json:"needs_hdd"`
|
|
}
|
|
|
|
// BuildInfraBackup collects all infrastructure state for Hub backup.
|
|
func BuildInfraBackup(
|
|
customerID, domain, version string,
|
|
controllerYAMLPath string,
|
|
settingsPath string,
|
|
resticPasswordFile string,
|
|
sett *settings.Settings,
|
|
stackProvider backup.StackDataProvider,
|
|
) (*InfraBackup, error) {
|
|
ib := &InfraBackup{
|
|
CustomerID: customerID,
|
|
Domain: domain,
|
|
ControllerVersion: version,
|
|
Timestamp: time.Now().UTC().Format(time.RFC3339),
|
|
}
|
|
|
|
// Read and encode controller.yaml
|
|
if data, err := os.ReadFile(controllerYAMLPath); err == nil {
|
|
ib.ControllerConfigB64 = base64.StdEncoding.EncodeToString(data)
|
|
}
|
|
|
|
// Read and encode settings.json
|
|
if data, err := os.ReadFile(settingsPath); err == nil {
|
|
ib.SettingsJSONB64 = base64.StdEncoding.EncodeToString(data)
|
|
}
|
|
|
|
// Read restic password
|
|
if data, err := os.ReadFile(resticPasswordFile); err == nil {
|
|
ib.ResticPassword = base64.StdEncoding.EncodeToString(data)
|
|
}
|
|
|
|
// Read cross-drive password
|
|
if pw := sett.GetCrossDriveResticPassword(); pw != "" {
|
|
ib.CrossDrivePassword = pw
|
|
}
|
|
|
|
// Collect disk layout (see implementation note below)
|
|
ib.DiskLayout = collectDiskLayout()
|
|
|
|
// Collect deployed stacks
|
|
deployed := stackProvider.ListDeployedStacks()
|
|
for _, s := range deployed {
|
|
ib.DeployedStacks = append(ib.DeployedStacks, InfraStack{
|
|
Name: s.Name,
|
|
DisplayName: s.DisplayName,
|
|
HDDPath: stackProvider.GetStackHDDPath(s.Name),
|
|
NeedsHDD: s.NeedsHDD,
|
|
})
|
|
}
|
|
|
|
return ib, nil
|
|
}
|
|
|
|
// collectDiskLayout reads /etc/fstab and lsblk to build the disk layout.
|
|
// This runs inside the container which has /host-fstab mounted and access to
|
|
// /host-dev/ for block device info.
|
|
func collectDiskLayout() DiskLayout {
|
|
// Implementation: parse /host-fstab (mounted from host /etc/fstab)
|
|
// and correlate with lsblk -J output.
|
|
//
|
|
// The controller already has disk management code in internal/stacks/
|
|
// or similar — reuse the existing lsblk parsing.
|
|
//
|
|
// For each non-root, non-swap, non-boot mount in fstab:
|
|
// - Extract UUID, mount point, fs_type, options
|
|
// - Detect role: "system_data" if mount_point matches system_data_path,
|
|
// "hdd_storage" if it's under /mnt/.felhom-raw/ or /mnt/hdd_*
|
|
// - Detect bind mounts (type=none, options contain "bind")
|
|
// - Get size from lsblk
|
|
//
|
|
// Return the DiskLayout struct.
|
|
//
|
|
// See the detailed implementation note in the "Implementation details" section.
|
|
return DiskLayout{}
|
|
}
|
|
```
|
|
|
|
### 1E: Controller — push infra backup after each backup cycle
|
|
|
|
**File:** `controller/cmd/controller/main.go`
|
|
|
|
Add the infra backup push to the backup scheduler (after Tier1 + Tier2 complete):
|
|
|
|
```go
|
|
// In the "backup" daily scheduler:
|
|
sched.Daily("backup", cfg.Backup.ResticSchedule, func(ctx context.Context) error {
|
|
err := backupMgr.RunBackup(ctx)
|
|
crossDriveRunner.RunAllScheduled(ctx, "daily")
|
|
if time.Now().Weekday() == time.Sunday {
|
|
crossDriveRunner.RunAllScheduled(ctx, "weekly")
|
|
}
|
|
|
|
// NEW: Push infra backup to Hub
|
|
if hubPusher != nil && cfg.Hub.Enabled {
|
|
go pushInfraBackup(cfg, sett, stackProv, hubPusher, logger)
|
|
}
|
|
|
|
return err
|
|
})
|
|
```
|
|
|
|
```go
|
|
func pushInfraBackup(cfg *config.Config, sett *settings.Settings,
|
|
stackProv backup.StackDataProvider, pusher *report.Pusher, logger *log.Logger) {
|
|
|
|
ib, err := report.BuildInfraBackup(
|
|
cfg.Customer.ID, cfg.Customer.Domain, Version,
|
|
"/opt/docker/felhom-controller/controller.yaml",
|
|
filepath.Join(cfg.Paths.DataDir, "settings.json"),
|
|
cfg.Backup.ResticPasswordFile,
|
|
sett, stackProv,
|
|
)
|
|
if err != nil {
|
|
logger.Printf("[WARN] Failed to build infra backup: %v", err)
|
|
return
|
|
}
|
|
|
|
data, err := json.Marshal(ib)
|
|
if err != nil {
|
|
logger.Printf("[WARN] Failed to marshal infra backup: %v", err)
|
|
return
|
|
}
|
|
|
|
if err := pusher.PushInfraBackup(data); err != nil {
|
|
logger.Printf("[WARN] Failed to push infra backup to Hub: %v", err)
|
|
} else {
|
|
logger.Printf("[INFO] Infra backup pushed to Hub (%d bytes)", len(data))
|
|
}
|
|
}
|
|
```
|
|
|
|
### 1F: Controller — add `PushInfraBackup` to Pusher
|
|
|
|
**File:** `controller/internal/report/pusher.go`
|
|
|
|
Add a new method alongside the existing `Push()`:
|
|
|
|
```go
|
|
// PushInfraBackup sends the infrastructure backup to the Hub.
|
|
func (p *Pusher) PushInfraBackup(data []byte) error {
|
|
if !p.enabled {
|
|
return nil
|
|
}
|
|
|
|
url := p.hubURL + "/api/v1/infra-backup"
|
|
|
|
var lastErr error
|
|
for attempt := 0; attempt < 3; attempt++ {
|
|
if attempt > 0 {
|
|
time.Sleep(5 * time.Second)
|
|
}
|
|
|
|
req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(data))
|
|
if err != nil {
|
|
lastErr = err
|
|
continue
|
|
}
|
|
req.Header.Set("Content-Type", "application/json")
|
|
if p.apiKey != "" {
|
|
req.Header.Set("Authorization", "Bearer "+p.apiKey)
|
|
}
|
|
|
|
resp, err := p.httpClient.Do(req)
|
|
if err != nil {
|
|
lastErr = err
|
|
continue
|
|
}
|
|
io.Copy(io.Discard, resp.Body)
|
|
resp.Body.Close()
|
|
|
|
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
|
|
return nil
|
|
}
|
|
lastErr = fmt.Errorf("HTTP %d", resp.StatusCode)
|
|
}
|
|
|
|
return fmt.Errorf("infra backup push failed after 3 attempts: %w", lastErr)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 2: New-deployment detection + Hub pull + auto-mount
|
|
|
|
### 2A: Controller — detect fresh deployment
|
|
|
|
**File:** `controller/cmd/controller/main.go`
|
|
|
|
The controller uses a Docker named volume (`controller-data`) at
|
|
`/opt/docker/felhom-controller/data`. On a fresh deployment, this volume is
|
|
empty — no `settings.json`, no `session_secret`, no `snapshot-history.json`.
|
|
|
|
Add detection after settings initialization:
|
|
|
|
```go
|
|
// Detect fresh deployment (empty data directory = new install)
|
|
isFreshDeployment := !fileExists(filepath.Join(cfg.Paths.DataDir, "settings.json"))
|
|
|
|
if isFreshDeployment {
|
|
logger.Println("[INFO] Fresh deployment detected — checking Hub for infra backup")
|
|
|
|
// Write a marker so we don't re-trigger on next restart
|
|
// (settings.json will be created by Settings.save() soon anyway)
|
|
}
|
|
```
|
|
|
|
**Important:** The marker to distinguish "fresh" from "restarted" is the absence
|
|
of `settings.json`. Once the Settings package creates it (on first save), subsequent
|
|
restarts won't trigger the fresh-deployment path.
|
|
|
|
### 2B: Controller — pull infra backup from Hub
|
|
|
|
**File:** `controller/internal/report/infra_pull.go` (NEW)
|
|
|
|
```go
|
|
package report
|
|
|
|
import (
|
|
"encoding/json"
|
|
"fmt"
|
|
"io"
|
|
"net/http"
|
|
"time"
|
|
)
|
|
|
|
// PullInfraBackup fetches the infrastructure backup from the Hub.
|
|
// Returns nil, nil if no backup exists for this customer.
|
|
func PullInfraBackup(hubURL, apiKey, customerID string) (*InfraBackup, error) {
|
|
url := hubURL + "/api/v1/infra-backup/" + customerID
|
|
|
|
client := &http.Client{Timeout: 30 * time.Second}
|
|
|
|
req, err := http.NewRequest(http.MethodGet, url, nil)
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
if apiKey != "" {
|
|
req.Header.Set("Authorization", "Bearer "+apiKey)
|
|
}
|
|
|
|
resp, err := client.Do(req)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("hub request failed: %w", err)
|
|
}
|
|
defer resp.Body.Close()
|
|
|
|
if resp.StatusCode == http.StatusNotFound {
|
|
return nil, nil // no backup for this customer
|
|
}
|
|
if resp.StatusCode != http.StatusOK {
|
|
return nil, fmt.Errorf("hub returned HTTP %d", resp.StatusCode)
|
|
}
|
|
|
|
body, err := io.ReadAll(io.LimitReader(resp.Body, 5<<20)) // 5MB limit
|
|
if err != nil {
|
|
return nil, fmt.Errorf("reading response: %w", err)
|
|
}
|
|
|
|
var ib InfraBackup
|
|
if err := json.Unmarshal(body, &ib); err != nil {
|
|
return nil, fmt.Errorf("parsing infra backup: %w", err)
|
|
}
|
|
|
|
return &ib, nil
|
|
}
|
|
```
|
|
|
|
### 2C: Controller — auto-mount drives from Hub disk layout
|
|
|
|
**File:** `controller/internal/backup/restore_drives.go` (NEW)
|
|
|
|
```go
|
|
package backup
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"log"
|
|
"os"
|
|
"os/exec"
|
|
"path/filepath"
|
|
"strings"
|
|
|
|
"gitea.dooplex.hu/admin/felhom-controller/internal/report"
|
|
)
|
|
|
|
// MountDrivesFromLayout scans block devices for disks matching the Hub's
|
|
// stored disk layout and mounts them. Uses the controller's existing
|
|
// two-layer mount pattern: raw mount → bind mount.
|
|
//
|
|
// The controller container has:
|
|
// - /host-dev:/dev (rw) — block device access
|
|
// - /host-fstab:/etc/fstab — can update fstab
|
|
// - privileged: true — can mount filesystems
|
|
//
|
|
// Returns the list of successfully mounted paths.
|
|
func MountDrivesFromLayout(ctx context.Context, layout report.DiskLayout, logger *log.Logger) ([]string, error) {
|
|
// 1. Get current block devices with UUIDs
|
|
lsblkDevices, err := getLsblkDevices(ctx)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("scanning block devices: %w", err)
|
|
}
|
|
|
|
var mounted []string
|
|
|
|
for _, diskMount := range layout.Mounts {
|
|
if diskMount.UUID == "" {
|
|
continue
|
|
}
|
|
|
|
// Skip system partitions (root, boot, swap)
|
|
if diskMount.Role == "root" || diskMount.Role == "boot" || diskMount.Role == "swap" {
|
|
continue
|
|
}
|
|
|
|
// Find matching device by UUID
|
|
device := findDeviceByUUID(lsblkDevices, diskMount.UUID)
|
|
if device == "" {
|
|
logger.Printf("[WARN] Disk UUID %s (%s) not found — drive may be missing",
|
|
diskMount.UUID, diskMount.Label)
|
|
continue
|
|
}
|
|
|
|
// Check if already mounted
|
|
if isMounted(diskMount.MountPoint) || isMounted(diskMount.RawMount) {
|
|
logger.Printf("[INFO] %s already mounted", diskMount.MountPoint)
|
|
mounted = append(mounted, diskMount.MountPoint)
|
|
continue
|
|
}
|
|
|
|
logger.Printf("[INFO] Found disk %s (UUID=%s, label=%s) — mounting to %s",
|
|
device, diskMount.UUID[:12], diskMount.Label, diskMount.MountPoint)
|
|
|
|
// Mount using the felhom two-layer pattern:
|
|
// Layer 1: raw mount → /mnt/.felhom-raw/<label>
|
|
// Layer 2: bind mount → <raw>/<subdir> to /mnt/<label>
|
|
if diskMount.RawMount != "" && diskMount.BindSubdir != "" {
|
|
// Two-layer HDD mount
|
|
if err := mountRawAndBind(ctx, device, diskMount, logger); err != nil {
|
|
logger.Printf("[ERROR] Failed to mount %s: %v", diskMount.Label, err)
|
|
continue
|
|
}
|
|
} else {
|
|
// Simple direct mount (e.g., sys_drive)
|
|
if err := mountDirect(ctx, device, diskMount, logger); err != nil {
|
|
logger.Printf("[ERROR] Failed to mount %s: %v", diskMount.Label, err)
|
|
continue
|
|
}
|
|
}
|
|
|
|
// Update host fstab so mount persists across reboots
|
|
if err := addToFstab(diskMount, logger); err != nil {
|
|
logger.Printf("[WARN] Failed to update fstab for %s: %v", diskMount.Label, err)
|
|
// Non-fatal — mount works for now, fstab can be fixed later
|
|
}
|
|
|
|
mounted = append(mounted, diskMount.MountPoint)
|
|
logger.Printf("[INFO] Mounted %s at %s", diskMount.Label, diskMount.MountPoint)
|
|
}
|
|
|
|
return mounted, nil
|
|
}
|
|
|
|
// getLsblkDevices runs lsblk -J and returns device → UUID mapping.
|
|
func getLsblkDevices(ctx context.Context) (map[string]string, error) {
|
|
cmd := exec.CommandContext(ctx, "lsblk", "-J", "-o", "NAME,UUID,LABEL,FSTYPE,SIZE,MOUNTPOINT")
|
|
out, err := cmd.Output()
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
var result struct {
|
|
BlockDevices []struct {
|
|
Name string `json:"name"`
|
|
UUID string `json:"uuid"`
|
|
Label string `json:"label"`
|
|
FSType string `json:"fstype"`
|
|
Size string `json:"size"`
|
|
Mount string `json:"mountpoint"`
|
|
Children []struct {
|
|
Name string `json:"name"`
|
|
UUID string `json:"uuid"`
|
|
Label string `json:"label"`
|
|
} `json:"children"`
|
|
} `json:"blockdevices"`
|
|
}
|
|
if err := json.Unmarshal(out, &result); err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
devices := make(map[string]string) // UUID → /dev/path
|
|
for _, dev := range result.BlockDevices {
|
|
if dev.UUID != "" {
|
|
devices[dev.UUID] = "/dev/" + dev.Name
|
|
}
|
|
for _, child := range dev.Children {
|
|
if child.UUID != "" {
|
|
devices[child.UUID] = "/dev/" + child.Name
|
|
}
|
|
}
|
|
}
|
|
return devices, nil
|
|
}
|
|
|
|
func findDeviceByUUID(devices map[string]string, uuid string) string {
|
|
return devices[uuid]
|
|
}
|
|
|
|
func isMounted(path string) bool {
|
|
if path == "" {
|
|
return false
|
|
}
|
|
_, err := os.Stat(path)
|
|
if err != nil {
|
|
return false
|
|
}
|
|
// Check /proc/mounts for the path
|
|
data, err := os.ReadFile("/proc/mounts")
|
|
if err != nil {
|
|
return false
|
|
}
|
|
return strings.Contains(string(data), " "+path+" ")
|
|
}
|
|
|
|
func mountDirect(ctx context.Context, device string, dm report.DiskMount, logger *log.Logger) error {
|
|
if err := os.MkdirAll(dm.MountPoint, 0755); err != nil {
|
|
return err
|
|
}
|
|
cmd := exec.CommandContext(ctx, "mount", "-t", dm.FSType, device, dm.MountPoint)
|
|
if out, err := cmd.CombinedOutput(); err != nil {
|
|
return fmt.Errorf("%s: %w", strings.TrimSpace(string(out)), err)
|
|
}
|
|
return nil
|
|
}
|
|
|
|
func mountRawAndBind(ctx context.Context, device string, dm report.DiskMount, logger *log.Logger) error {
|
|
// Layer 1: raw mount
|
|
if err := os.MkdirAll(dm.RawMount, 0755); err != nil {
|
|
return err
|
|
}
|
|
cmd := exec.CommandContext(ctx, "mount", "-t", dm.FSType, "-o", "noatime", device, dm.RawMount)
|
|
if out, err := cmd.CombinedOutput(); err != nil {
|
|
return fmt.Errorf("raw mount: %s: %w", strings.TrimSpace(string(out)), err)
|
|
}
|
|
|
|
// Layer 2: bind mount (subdir → final mount point)
|
|
bindSrc := filepath.Join(dm.RawMount, dm.BindSubdir)
|
|
if err := os.MkdirAll(bindSrc, 0755); err != nil {
|
|
return err
|
|
}
|
|
if err := os.MkdirAll(dm.MountPoint, 0755); err != nil {
|
|
return err
|
|
}
|
|
cmd = exec.CommandContext(ctx, "mount", "--bind", bindSrc, dm.MountPoint)
|
|
if out, err := cmd.CombinedOutput(); err != nil {
|
|
return fmt.Errorf("bind mount: %s: %w", strings.TrimSpace(string(out)), err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func addToFstab(dm report.DiskMount, logger *log.Logger) error {
|
|
const fstabPath = "/host-fstab" // mounted from host /etc/fstab
|
|
|
|
data, err := os.ReadFile(fstabPath)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
content := string(data)
|
|
|
|
// Check if UUID already in fstab
|
|
if strings.Contains(content, dm.UUID) {
|
|
logger.Printf("[INFO] UUID %s already in fstab", dm.UUID[:12])
|
|
return nil
|
|
}
|
|
|
|
// Append entries
|
|
var additions strings.Builder
|
|
additions.WriteString("\n# Restored by felhom-controller DR\n")
|
|
|
|
if dm.RawMount != "" {
|
|
// Raw mount entry
|
|
additions.WriteString(fmt.Sprintf("UUID=%s\t%s\t%s\t%s\t0 2\n",
|
|
dm.UUID, dm.RawMount, dm.FSType, dm.FstabOptions))
|
|
}
|
|
|
|
if dm.BindSubdir != "" && dm.RawMount != "" {
|
|
// Bind mount entry
|
|
additions.WriteString(fmt.Sprintf("%s/%s\t%s\tnone\tbind,nofail\t0 0\n",
|
|
dm.RawMount, dm.BindSubdir, dm.MountPoint))
|
|
} else if dm.RawMount == "" {
|
|
// Direct mount entry
|
|
additions.WriteString(fmt.Sprintf("UUID=%s\t%s\t%s\t%s\t0 2\n",
|
|
dm.UUID, dm.MountPoint, dm.FSType, dm.FstabOptions))
|
|
}
|
|
|
|
// Atomic write
|
|
tmpPath := fstabPath + ".tmp"
|
|
if err := os.WriteFile(tmpPath, []byte(content+additions.String()), 0644); err != nil {
|
|
return err
|
|
}
|
|
return os.Rename(tmpPath, fstabPath)
|
|
}
|
|
```
|
|
|
|
**Important implementation note:** The controller runs inside a Docker container
|
|
with `privileged: true`. The mount operations happen on the host's mount namespace
|
|
because the container has `propagation: rshared` on the `/mnt` volume. The lsblk
|
|
command will see host block devices via `/host-dev`. Study the existing disk
|
|
management code in the controller before implementing — there may be helpers
|
|
for lsblk parsing and mount operations already.
|
|
|
|
### 2D: Controller — orchestrate the fresh-deployment flow
|
|
|
|
**File:** `controller/cmd/controller/main.go`
|
|
|
|
In the startup sequence, after `isFreshDeployment` detection:
|
|
|
|
```go
|
|
if isFreshDeployment {
|
|
logger.Println("[INFO] Fresh deployment detected — checking Hub for infra backup")
|
|
|
|
var infraBackup *report.InfraBackup
|
|
var restoreSource string
|
|
|
|
// Try Hub first (primary path)
|
|
if cfg.Hub.Enabled && cfg.Hub.URL != "" {
|
|
ib, err := report.PullInfraBackup(cfg.Hub.URL, cfg.Hub.APIKey, cfg.Customer.ID)
|
|
if err != nil {
|
|
logger.Printf("[WARN] Could not reach Hub: %v", err)
|
|
} else if ib != nil {
|
|
infraBackup = ib
|
|
restoreSource = "hub"
|
|
logger.Printf("[INFO] Found infra backup on Hub: %s (%s), %d stacks, synced %s",
|
|
ib.Domain, ib.CustomerID, len(ib.DeployedStacks), ib.Timestamp)
|
|
} else {
|
|
logger.Println("[INFO] No infra backup found on Hub for this customer")
|
|
}
|
|
}
|
|
|
|
if infraBackup != nil {
|
|
// Restore restic passwords from Hub backup
|
|
restorePasswordsFromHub(infraBackup, cfg, sett, logger)
|
|
|
|
// Restore settings.json from Hub backup
|
|
restoreSettingsFromHub(infraBackup, cfg, logger)
|
|
|
|
// Mount drives using stored disk layout
|
|
ctx := context.Background()
|
|
mountedPaths, err := backup.MountDrivesFromLayout(ctx, infraBackup.DiskLayout, logger)
|
|
if err != nil {
|
|
logger.Printf("[WARN] Drive mounting error: %v", err)
|
|
} else {
|
|
logger.Printf("[INFO] Mounted %d drives from Hub disk layout", len(mountedPaths))
|
|
}
|
|
|
|
// Now scan mounted drives for local backup data
|
|
mountPoints := discoverMountPoints() // re-scan after mounting
|
|
restoreDrives = backup.DetectBackupsOnDrives(mountPoints, logger)
|
|
|
|
// Auto-restore stack configs
|
|
if len(restoreDrives) > 0 {
|
|
restored, err := backup.RestoreStackConfigs(restoreDrives, cfg.Paths.StacksDir, logger)
|
|
if err != nil {
|
|
logger.Printf("[WARN] Stack config restore: %v", err)
|
|
} else {
|
|
logger.Printf("[INFO] Restored %d stack configs from local backup", restored)
|
|
}
|
|
} else if infraBackup != nil {
|
|
// Fallback: restore stack configs from Hub data
|
|
// (Hub has the deployed_stacks list but not full compose files)
|
|
logger.Println("[WARN] No local backups found — stack configs must be synced from git catalog")
|
|
}
|
|
|
|
// Re-scan stacks
|
|
stackMgr.ScanStacks()
|
|
|
|
// Build restore plan (uses local backup data for rsync/restic info)
|
|
restorePlan = backup.BuildRestorePlan(restoreDrives, logger)
|
|
restoreMode = true
|
|
|
|
} else {
|
|
// Fallback: try local-only detection (drives might be pre-mounted)
|
|
mountPoints := discoverMountPoints()
|
|
restoreDrives = backup.DetectBackupsOnDrives(mountPoints, logger)
|
|
if len(restoreDrives) > 0 {
|
|
// Same local-only flow as before
|
|
// ...
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Helper functions:
|
|
|
|
```go
|
|
func restorePasswordsFromHub(ib *report.InfraBackup, cfg *config.Config,
|
|
sett *settings.Settings, logger *log.Logger) {
|
|
|
|
if ib.ResticPassword != "" {
|
|
if decoded, err := base64.StdEncoding.DecodeString(ib.ResticPassword); err == nil {
|
|
dir := filepath.Dir(cfg.Backup.ResticPasswordFile)
|
|
os.MkdirAll(dir, 0700)
|
|
if err := os.WriteFile(cfg.Backup.ResticPasswordFile, decoded, 0600); err == nil {
|
|
logger.Println("[INFO] Primary restic password restored from Hub")
|
|
}
|
|
}
|
|
}
|
|
|
|
if ib.CrossDrivePassword != "" {
|
|
if err := sett.SetCrossDriveResticPassword(ib.CrossDrivePassword); err == nil {
|
|
logger.Println("[INFO] Cross-drive restic password restored from Hub")
|
|
}
|
|
}
|
|
}
|
|
|
|
func restoreSettingsFromHub(ib *report.InfraBackup, cfg *config.Config, logger *log.Logger) {
|
|
if ib.SettingsJSONB64 == "" {
|
|
return
|
|
}
|
|
decoded, err := base64.StdEncoding.DecodeString(ib.SettingsJSONB64)
|
|
if err != nil {
|
|
return
|
|
}
|
|
settingsPath := filepath.Join(cfg.Paths.DataDir, "settings.json")
|
|
if err := os.WriteFile(settingsPath, decoded, 0600); err == nil {
|
|
logger.Println("[INFO] Settings restored from Hub backup")
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 3: Restore UI + app data restoration
|
|
|
|
### 3A: Restore page + API handlers
|
|
|
|
Same as the previous TASK2 design — the restore UI, API endpoints, and
|
|
sequential restoration logic. Now that drives are mounted by Phase 2,
|
|
the local backup data is accessible.
|
|
|
|
**Files (NEW):**
|
|
- `controller/internal/web/handler_restore.go` — page handler + API
|
|
- `controller/internal/web/templates/restore.html` — restore wizard UI
|
|
- `controller/internal/backup/restore_rsync.go` — restore from rsync backups
|
|
|
|
**Files (MODIFIED):**
|
|
- `controller/internal/web/server.go` — route registration + restore state
|
|
- `controller/internal/web/templates/dashboard.html` — restore banner
|
|
- `controller/internal/web/templates/layout.html` — sidebar restore link
|
|
|
|
The implementation is the same as described in the sections below.
|
|
Refer to the "Phase 3 detail" section at the end of this document.
|
|
|
|
### 3B: Restore-from-rsync function
|
|
|
|
Same as previously designed. The rsync backups are plain files — no password
|
|
needed. The function rsyncs `_config/`, `_db/`, and user data directories
|
|
back to their original locations.
|
|
|
|
Strategy: **rsync first, restic fallback** (sequential per app).
|
|
|
|
### 3C: Restore flow integration
|
|
|
|
After the user clicks "Restore All" on the restore page:
|
|
|
|
1. For each app in the restore plan (sequentially):
|
|
a. Check for rsync backup → use `RestoreFromRsync()` if available
|
|
b. Else check for restic backup → use existing `RestoreApp()` with latest snapshot
|
|
c. If DB dump exists → restore to the app's dump directory
|
|
d. Pull Docker images (`docker compose pull`)
|
|
e. Start the app (`docker compose up -d`)
|
|
f. Update status in UI (via polling API)
|
|
2. When all done, clear `restoreMode` flag
|
|
3. Dashboard returns to normal
|
|
|
|
---
|
|
|
|
## Phase 4: docker-setup.sh integration
|
|
|
|
### 4A: Minimal controller.yaml for fresh deployment
|
|
|
|
The setup script's wizard collects just enough for the controller to start
|
|
and contact the Hub:
|
|
|
|
**Required for Hub contact:**
|
|
- `customer.id` — identifies which backup to pull
|
|
- `customer.domain` — for Traefik labels
|
|
- `hub.enabled: true`
|
|
- `hub.api_key` — hardcoded, same for everyone
|
|
- `hub.url` — hardcoded
|
|
|
|
**Everything else** can be restored from the Hub backup (git credentials,
|
|
monitoring UUIDs, CF tokens, etc.). The wizard should still ask for these
|
|
as before (they might be a genuinely new customer with no Hub backup),
|
|
but the restore flow overwrites them if a Hub backup is found.
|
|
|
|
### 4B: Post-deploy message
|
|
|
|
After deploying the controller, the script prints:
|
|
|
|
```
|
|
If this is a reinstallation, the controller will automatically:
|
|
1. Contact the Hub for your previous configuration
|
|
2. Mount your existing storage drives
|
|
3. Detect and restore your applications
|
|
|
|
Open https://felhom.<DOMAIN> to monitor the restore process.
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 3 detail: Restore UI and data restoration
|
|
|
|
### handler_restore.go
|
|
|
|
**File:** NEW `controller/internal/web/handler_restore.go`
|
|
|
|
```go
|
|
package web
|
|
|
|
import (
|
|
"context"
|
|
"net/http"
|
|
|
|
"gitea.dooplex.hu/admin/felhom-controller/internal/backup"
|
|
)
|
|
|
|
func (s *Server) restorePageHandler(w http.ResponseWriter, r *http.Request) {
|
|
if !s.restoreMode {
|
|
http.Redirect(w, r, "/", http.StatusFound)
|
|
return
|
|
}
|
|
|
|
data := s.baseData("restore", "Visszaállítás")
|
|
data["RestorePlan"] = s.restorePlan
|
|
data["Drives"] = s.restoreDrives
|
|
|
|
// Summary from first available drive manifest
|
|
for _, d := range s.restoreDrives {
|
|
if d.Manifest != nil {
|
|
data["Domain"] = d.Manifest.Domain
|
|
data["CustomerID"] = d.Manifest.CustomerID
|
|
data["LastSync"] = d.Manifest.LastSync
|
|
data["StackCount"] = len(d.Manifest.DeployedStacks)
|
|
break
|
|
}
|
|
}
|
|
s.render(w, "restore", data)
|
|
}
|
|
|
|
func (s *Server) apiRestoreApp(w http.ResponseWriter, r *http.Request, stackName string) {
|
|
if !s.restoreMode {
|
|
writeJSON(w, http.StatusBadRequest, apiResponse{OK: false, Error: "not in restore mode"})
|
|
return
|
|
}
|
|
|
|
var app *backup.RestorableApp
|
|
for i := range s.restorePlan {
|
|
if s.restorePlan[i].Name == stackName {
|
|
app = &s.restorePlan[i]
|
|
break
|
|
}
|
|
}
|
|
if app == nil {
|
|
writeJSON(w, http.StatusNotFound, apiResponse{OK: false, Error: "app not in restore plan"})
|
|
return
|
|
}
|
|
|
|
go s.executeAppRestore(app)
|
|
writeJSON(w, http.StatusOK, apiResponse{OK: true, Message: "Visszaállítás elindítva"})
|
|
}
|
|
|
|
func (s *Server) apiRestoreAll(w http.ResponseWriter, r *http.Request) {
|
|
if !s.restoreMode {
|
|
writeJSON(w, http.StatusBadRequest, apiResponse{OK: false, Error: "not in restore mode"})
|
|
return
|
|
}
|
|
go s.executeAllRestores()
|
|
writeJSON(w, http.StatusOK, apiResponse{OK: true, Message: "Visszaállítás elindítva"})
|
|
}
|
|
|
|
func (s *Server) apiRestoreStatus(w http.ResponseWriter, r *http.Request) {
|
|
writeJSON(w, http.StatusOK, apiResponse{OK: true, Data: s.restorePlan})
|
|
}
|
|
|
|
func (s *Server) executeAppRestore(app *backup.RestorableApp) {
|
|
ctx := context.Background()
|
|
app.RestoreStatus = "restoring"
|
|
|
|
var restoreErr error
|
|
|
|
if app.HasRsync {
|
|
restoreErr = backup.RestoreFromRsync(ctx, *app, s.cfg.Paths.StacksDir, s.logger)
|
|
} else if app.HasRestic {
|
|
restoreErr = s.restoreFromResticBackup(ctx, app)
|
|
} else {
|
|
restoreErr = fmt.Errorf("no backup source available")
|
|
}
|
|
|
|
if restoreErr != nil {
|
|
app.RestoreStatus = "failed"
|
|
app.RestoreError = restoreErr.Error()
|
|
s.logger.Printf("[ERROR] Restore failed for %s: %v", app.Name, restoreErr)
|
|
return
|
|
}
|
|
|
|
// Pull images and start
|
|
if err := s.stackMgr.PullAndStart(app.Name); err != nil {
|
|
s.logger.Printf("[WARN] Could not start %s after restore: %v", app.Name, err)
|
|
}
|
|
|
|
app.RestoreStatus = "done"
|
|
s.logger.Printf("[INFO] Restore completed for %s", app.Name)
|
|
}
|
|
|
|
func (s *Server) executeAllRestores() {
|
|
for i := range s.restorePlan {
|
|
app := &s.restorePlan[i]
|
|
if app.RestoreStatus == "done" || app.RestoreStatus == "failed" {
|
|
continue
|
|
}
|
|
s.executeAppRestore(app)
|
|
}
|
|
s.logger.Println("[INFO] All app restores completed")
|
|
}
|
|
```
|
|
|
|
### restore.html template
|
|
|
|
**File:** NEW `controller/internal/web/templates/restore.html`
|
|
|
|
All text in Hungarian. The template renders:
|
|
- Banner: "Korábbi telepítés észlelve" (Previous installation detected)
|
|
- Summary: domain, customer, last sync timestamp, stack count
|
|
- Table: app name, backup type (Rsync/Restic/None), DB dump (yes/no), status, action button
|
|
- "Összes visszaállítása" (Restore all) button
|
|
- "Kihagyás" (Skip) button → redirects to dashboard
|
|
- JavaScript polling (3s interval) for status updates during restore
|
|
- Auto-redirect to dashboard when all done
|
|
|
|
See the previous TASK2 version for the full template HTML — it remains the same.
|
|
|
|
### restore_rsync.go
|
|
|
|
**File:** NEW `controller/internal/backup/restore_rsync.go`
|
|
|
|
Restores app data from cross-drive rsync backup:
|
|
1. `_config/` → stack compose directory
|
|
2. `_db/` → DB dump directory
|
|
3. User data directories → original mount paths
|
|
|
|
See the previous TASK2 version for the implementation — the function signature
|
|
and logic remain the same.
|
|
|
|
### Route registration
|
|
|
|
**File:** `controller/internal/web/server.go`
|
|
|
|
Add to `ServeHTTP()`:
|
|
|
|
```go
|
|
case path == "/restore":
|
|
s.restorePageHandler(w, r)
|
|
case path == "/api/restore/all" && r.Method == http.MethodPost:
|
|
s.apiRestoreAll(w, r)
|
|
case path == "/api/restore/status":
|
|
s.apiRestoreStatus(w, r)
|
|
case strings.HasPrefix(path, "/api/restore/") && r.Method == http.MethodPost:
|
|
stackName := strings.TrimPrefix(path, "/api/restore/")
|
|
s.apiRestoreApp(w, r, stackName)
|
|
```
|
|
|
|
### Dashboard banner + sidebar link
|
|
|
|
Same as previous TASK2 — add conditional restore banner to `dashboard.html`
|
|
and restore nav link to `layout.html`.
|
|
|
|
---
|
|
|
|
## Also: Continue backing up passwords to `_infra/` locally
|
|
|
|
The local `_infra/` backup (on each drive) should ALSO include passwords,
|
|
as a belt-and-suspenders approach. If the Hub is unreachable during DR,
|
|
but drives happen to be pre-mounted (manual fstab or auto-detection),
|
|
the local backup should be self-sufficient.
|
|
|
|
**File:** `controller/internal/backup/crossdrive.go` — modify `syncInfraConfig()`
|
|
|
|
After the existing controller.yaml copy (line 494), add:
|
|
|
|
```go
|
|
// Copy primary restic password → _infra/restic-password
|
|
if data, err := os.ReadFile(r.primaryResticPasswordFile); err == nil {
|
|
pwDest := filepath.Join(infraDir, "restic-password")
|
|
os.WriteFile(pwDest, data, 0600)
|
|
}
|
|
|
|
// Copy cross-drive restic password → _infra/cross-drive-password
|
|
if cdPw := r.sett.GetCrossDriveResticPassword(); cdPw != "" {
|
|
cdDest := filepath.Join(infraDir, "cross-drive-password")
|
|
os.WriteFile(cdDest, []byte(cdPw), 0600)
|
|
}
|
|
|
|
// Write manifest.json
|
|
r.writeManifest(infraDir)
|
|
```
|
|
|
|
Add `primaryResticPasswordFile` field to `CrossDriveRunner` struct, pass from
|
|
`main.go`, and add the `writeManifest()` helper (see Phase 1D for the manifest
|
|
format — same `InfraStack` structure).
|
|
|
|
---
|
|
|
|
## Summary of all files
|
|
|
|
### Hub (`e:/git/felhom.eu/hub/`)
|
|
|
|
| File | Change | Phase |
|
|
|------|--------|-------|
|
|
| `internal/store/store.go` | New table `infra_backups` + `SaveInfraBackup`, `GetInfraBackup` | 1 |
|
|
| `internal/api/handler.go` | New endpoints: POST + GET `/api/v1/infra-backup` | 1 |
|
|
| `internal/web/templates/customer.html` | Infra backup status section | 1 |
|
|
| `internal/web/server.go` | Pass infra backup data to customer template | 1 |
|
|
|
|
### Controller (`e:/git/deploy-felhom-compose/controller/`)
|
|
|
|
| File | Change | Phase |
|
|
|------|--------|-------|
|
|
| `internal/report/infra_backup.go` | **NEW** — `InfraBackup` type, `BuildInfraBackup()`, `collectDiskLayout()` | 1 |
|
|
| `internal/report/infra_pull.go` | **NEW** — `PullInfraBackup()` | 2 |
|
|
| `internal/report/pusher.go` | Add `PushInfraBackup()` method | 1 |
|
|
| `internal/backup/restore_drives.go` | **NEW** — `MountDrivesFromLayout()`, lsblk parsing, fstab updates | 2 |
|
|
| `internal/backup/restore_infra.go` | **NEW** — `DetectBackupsOnDrives()`, `BuildRestorePlan()`, `RestoreStackConfigs()`, `RestoreResticPasswords()` | 2 |
|
|
| `internal/backup/restore_rsync.go` | **NEW** — `RestoreFromRsync()` | 3 |
|
|
| `internal/backup/crossdrive.go` | Add password backup + manifest to `syncInfraConfig()` | 1 |
|
|
| `internal/backup/paths.go` | New path helpers for `_infra/` files | 1 |
|
|
| `internal/settings/settings.go` | Add `GetCrossDriveResticPassword()`, `SetCrossDriveResticPassword()` | 1 |
|
|
| `cmd/controller/main.go` | Fresh-deployment detection, Hub pull, drive mount, restore orchestration | 2 |
|
|
| `internal/web/server.go` | Restore routes, `SetRestoreState()` | 3 |
|
|
| `internal/web/handler_restore.go` | **NEW** — restore page + API handlers | 3 |
|
|
| `internal/web/templates/restore.html` | **NEW** — restore wizard UI (Hungarian) | 3 |
|
|
| `internal/web/templates/dashboard.html` | Restore banner | 3 |
|
|
| `internal/web/templates/layout.html` | Sidebar restore link | 3 |
|
|
|
|
### Script
|
|
|
|
| File | Change | Phase |
|
|
|------|--------|-------|
|
|
| `scripts/docker-setup.sh` | Hub-aware restore detection in wizard | 4 |
|
|
|
|
### Total: ~1400 lines across 17 files (7 new, 10 modified)
|
|
|
|
---
|
|
|
|
## Build & deploy order
|
|
|
|
**Phase 1 (Hub + controller push):**
|
|
```bash
|
|
# 1. Build and deploy Hub
|
|
cd e:/git/felhom.eu/hub
|
|
# ... implement changes ...
|
|
make VERSION=0.2.0 docker docker-push
|
|
kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:v0.2.0
|
|
|
|
# 2. Build and deploy controller
|
|
cd e:/git/deploy-felhom-compose/controller
|
|
# ... implement changes ...
|
|
# Build, push, deploy as usual (see MEMORY.md workflow)
|
|
```
|
|
|
|
**Phase 2 (controller pull + auto-mount):**
|
|
```bash
|
|
# Controller-only changes. Build and deploy.
|
|
```
|
|
|
|
**Phase 3 (restore UI):**
|
|
```bash
|
|
# Controller-only changes. Build and deploy.
|
|
# Test: stop controller, clear stacks dir, restart → should enter restore mode
|
|
```
|
|
|
|
**Phase 4 (docker-setup.sh):**
|
|
```bash
|
|
# Script changes only. Copy to demo node and test.
|
|
```
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
### Phase 1 verification
|
|
```bash
|
|
# After deploying updated controller, trigger a backup:
|
|
curl -X POST https://felhom.demo-felhom.eu/api/backup/run
|
|
|
|
# Check Hub for the infra backup:
|
|
curl -H "Authorization: Bearer 094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8" \
|
|
https://hub.felhom.eu/api/v1/infra-backup/demo-felhom | jq .
|
|
```
|
|
|
|
### Phase 2-3 simulation (on demo node)
|
|
```bash
|
|
# WARNING: This simulates a DR scenario on the demo node.
|
|
# It temporarily clears the stacks dir to trigger restore mode.
|
|
SSH=/c/Windows/System32/OpenSSH/ssh.exe
|
|
|
|
# 1. Backup current state
|
|
$SSH kisfenyo@192.168.0.162 "sudo cp -r /opt/docker/stacks /tmp/stacks-backup"
|
|
|
|
# 2. Stop controller, clear stacks to simulate fresh install
|
|
$SSH kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker compose down"
|
|
$SSH kisfenyo@192.168.0.162 "sudo docker volume rm felhom-controller_controller-data"
|
|
$SSH kisfenyo@192.168.0.162 "sudo rm -rf /opt/docker/stacks/*"
|
|
|
|
# 3. Start controller — should detect fresh deployment + pull from Hub
|
|
$SSH kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker compose up -d"
|
|
$SSH kisfenyo@192.168.0.162 "sleep 15 && docker logs felhom-controller --tail 30 2>&1"
|
|
|
|
# 4. Open dashboard — should show restore wizard
|
|
|
|
# 5. After testing, restore original state if needed:
|
|
$SSH kisfenyo@192.168.0.162 "sudo cp -r /tmp/stacks-backup/* /opt/docker/stacks/"
|
|
```
|