6713df2186
Complete DR implementation (TASK2.md Phases 1-4): - Hub infra-backup push/pull endpoints (controller.yaml, disk layout, stacks) - Fresh-deployment detection pulls config from Hub, auto-mounts drives by UUID - Full-page restore UI with drive status, app table, sequential restore - docker-setup.sh shows DR instructions when customer_id is configured New files: disk_layout.go, restore_scan.go, restore_app_linux.go, restore_drives_linux.go, infra_backup.go, infra_pull.go, handler_restore.go, restore.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
522 lines
17 KiB
Markdown
522 lines
17 KiB
Markdown
# TASK: Major rewrite of `scripts/docker-setup.sh` (v5.0)
|
|
|
|
## Overview
|
|
|
|
Rewrite `docker-setup.sh` to bring it up to date with the current Felhom architecture.
|
|
The script should now be a complete end-to-end provisioning tool: install infrastructure,
|
|
run an interactive configuration wizard, generate `controller.yaml`, deploy FileBrowser
|
|
as a protected stack, and deploy felhom-controller — all in one run.
|
|
|
|
**Read the entire current `scripts/docker-setup.sh` before starting. This is a rewrite
|
|
of an existing ~1600-line script, not a new file.**
|
|
|
|
---
|
|
|
|
## Changes Required
|
|
|
|
### 1. Update banner and version
|
|
|
|
- Set `SCRIPT_VERSION="5.0.0"`
|
|
- Update `print_banner()` — no Portainer, the title should be:
|
|
```
|
|
Felhom Infrastructure Setup v5.0.0
|
|
```
|
|
- Update the comment header block at the top of the file to match the new scope
|
|
(Docker + Traefik + FileBrowser + Controller + configuration wizard).
|
|
- Update `print_help()` to reflect all removed/changed options.
|
|
|
|
### 2. Remove Portainer (confirm clean)
|
|
|
|
The current script has no Portainer code (already removed in a prior version).
|
|
Just make sure there are zero references to "portainer" or "Portainer" anywhere —
|
|
banner, comments, help text, variables. Search and confirm.
|
|
|
|
### 3. Remove `--cf-tunnel-token` CLI option
|
|
|
|
**Remove** the `--cf-tunnel-token` CLI flag and the `CF_TUNNEL_TOKEN` variable from
|
|
`parse_args()`. The Cloudflare tunnel token is now collected by the configuration wizard
|
|
and written into `controller.yaml` (see §7 below). The `install_cloudflare_tunnel()`
|
|
function stays but reads the token from the wizard variable instead of a CLI flag.
|
|
|
|
Also remove `--hdd-path` CLI option and `HDD_PATH` variable — deprecated.
|
|
|
|
Keep these CLI options (still useful for non-interactive/scripted runs):
|
|
- `--ip`, `--gateway`, `--dns`, `--interface` (network config)
|
|
- `--domain`, `--email`, `--cf-token` (TLS/domain — can pre-seed wizard)
|
|
- `--customer` (customer ID — can pre-seed wizard)
|
|
- `--traefik-password`, `--self-signed-cert`
|
|
- `--skip-filebrowser`
|
|
- `--dry-run`, `--debug`, `--help`, `--bootstrap`
|
|
|
|
### 4. Remove `--hdd-path` references
|
|
|
|
Remove `HDD_PATH` variable, `--hdd-path` argument parsing, and all references.
|
|
FileBrowser mounts are determined by the wizard (system_data_path and any existing
|
|
`/mnt/*` mounts).
|
|
|
|
### 5. FileBrowser deployment as protected stack
|
|
|
|
The current `install_filebrowser()` function needs to be rewritten:
|
|
|
|
**Location:** Deploy to `/opt/docker/stacks/filebrowser/` (already the current
|
|
`FILEBROWSER_DIR` — keep this).
|
|
|
|
**Compose file:** Generate a compose file matching the current production layout
|
|
on the demo node. Key differences from current script template:
|
|
|
|
```yaml
|
|
services:
|
|
filebrowser:
|
|
image: gtstef/filebrowser:latest
|
|
container_name: filebrowser
|
|
restart: unless-stopped
|
|
environment:
|
|
- TZ=Europe/Budapest
|
|
volumes:
|
|
- filebrowser_data:/home/filebrowser/data
|
|
# Mount discovered drives — populated by wizard
|
|
# e.g. /mnt/hdd_1:/srv/hdd_1, /mnt/sys_drive:/srv/sys_drive
|
|
networks:
|
|
- traefik-public
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 256M
|
|
healthcheck:
|
|
test: ["CMD", "wget", "--spider", "-q", "http://localhost:80/"]
|
|
interval: 30s
|
|
timeout: 5s
|
|
retries: 3
|
|
start_period: 15s
|
|
labels:
|
|
- "traefik.enable=true"
|
|
- "traefik.http.routers.filebrowser.rule=Host(`files.<DOMAIN>`)"
|
|
- "traefik.http.routers.filebrowser.entrypoints=websecure"
|
|
- "traefik.http.routers.filebrowser.tls=true"
|
|
- "traefik.http.services.filebrowser.loadbalancer.server.port=80"
|
|
- "traefik.docker.network=traefik-public"
|
|
```
|
|
|
|
**Drive discovery for volumes:** The wizard (§7) collects `system_data_path`.
|
|
Additionally, scan `/mnt/` for existing mount points at install time. For each
|
|
discovered mount (e.g., `/mnt/hdd_1`, `/mnt/sys_drive`), add a volume mapping:
|
|
`/mnt/<name>:/srv/<name>`. If no mounts found, only mount the `system_data_path`.
|
|
|
|
**Hardcode domain** in the Traefik host rule (no `${DOMAIN}` env var needed).
|
|
Use the wizard's domain value directly: `Host(\`files.ACTUAL-DOMAIN\`)`.
|
|
|
|
**Also generate `.felhom.yml`** metadata file — keep the existing one from the
|
|
current script (Hungarian text, category: storage, etc.).
|
|
|
|
**No `.env` file needed** for filebrowser (domain is hardcoded in compose labels).
|
|
|
|
### 6. Controller deployment (NEW step)
|
|
|
|
Add a new step to deploy felhom-controller. This is currently missing from the
|
|
script — the user had to deploy it manually.
|
|
|
|
**Location:** `/opt/docker/felhom-controller/`
|
|
|
|
**docker-compose.yml** — generate matching the current production layout:
|
|
|
|
```yaml
|
|
services:
|
|
felhom-controller:
|
|
image: gitea.dooplex.hu/admin/felhom-controller:latest
|
|
container_name: felhom-controller
|
|
restart: unless-stopped
|
|
privileged: true
|
|
ports:
|
|
- "8080:8080"
|
|
volumes:
|
|
- /var/run/docker.sock:/var/run/docker.sock
|
|
- /opt/docker/felhom-controller/controller.yaml:/opt/docker/felhom-controller/controller.yaml:ro
|
|
- controller-data:/opt/docker/felhom-controller/data
|
|
- /opt/docker/stacks:/opt/docker/stacks
|
|
- /srv/backups:/srv/backups
|
|
- type: bind
|
|
source: /mnt
|
|
target: /mnt
|
|
bind:
|
|
propagation: rshared
|
|
- /sys:/host/sys:ro
|
|
- /etc/os-release:/host/etc/os-release:ro
|
|
- /etc/hostname:/host/etc/hostname:ro
|
|
- /dev:/host-dev:rw
|
|
- /etc/fstab:/host-fstab
|
|
- /run/udev:/run/udev:ro
|
|
environment:
|
|
- TZ=Europe/Budapest
|
|
labels:
|
|
- "traefik.enable=true"
|
|
- "traefik.http.routers.controller.rule=Host(`felhom.<DOMAIN>`)"
|
|
- "traefik.http.routers.controller.entrypoints=websecure"
|
|
- "traefik.http.routers.controller.tls=true"
|
|
- "traefik.http.services.controller.loadbalancer.server.port=8080"
|
|
- "traefik.docker.network=traefik-public"
|
|
- "felhom.managed=true"
|
|
- "felhom.component=controller"
|
|
networks:
|
|
- traefik-public
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8080/api/health"]
|
|
interval: 30s
|
|
timeout: 5s
|
|
start_period: 10s
|
|
retries: 3
|
|
|
|
volumes:
|
|
controller-data:
|
|
|
|
networks:
|
|
traefik-public:
|
|
external: true
|
|
```
|
|
|
|
**Hardcode domain** in Traefik labels (like filebrowser).
|
|
|
|
**Generate `.env`** with just `DOMAIN=<domain>` — needed only as a reference/
|
|
documentation, since we hardcode the domain in compose labels. Actually, skip
|
|
the `.env` file entirely — compose doesn't need it if labels are hardcoded.
|
|
|
|
**Use `latest` tag** for the image. The controller has self-update capability
|
|
so it will manage its own version after initial deployment.
|
|
|
|
**Pull and start** the controller, then verify health via the healthcheck endpoint.
|
|
|
|
### 7. Configuration wizard for `controller.yaml`
|
|
|
|
Add an interactive wizard function `run_config_wizard()` that runs AFTER
|
|
infrastructure setup but BEFORE deploying the controller. It generates
|
|
`/opt/docker/felhom-controller/controller.yaml`.
|
|
|
|
**CLI pre-seeding:** If `--domain`, `--customer`, `--email`, `--cf-token` are
|
|
provided via CLI, use them as defaults in the wizard (user can still change).
|
|
|
|
**Wizard flow** (each question is a `read -p` prompt with a default shown in brackets):
|
|
|
|
```
|
|
===========================================================
|
|
Felhom Controller Configuration Wizard
|
|
===========================================================
|
|
|
|
--- Customer identity ---
|
|
Customer ID [demo-felhom]: _
|
|
Customer display name [Demo Ügyfél]: _
|
|
Domain [homeserver.local]: _
|
|
Customer email (optional) []: _
|
|
|
|
--- Infrastructure secrets ---
|
|
Cloudflare Tunnel token (optional, leave empty to skip) []: _
|
|
Cloudflare API token (for DNS-01 certs, optional) []: _
|
|
|
|
--- Paths ---
|
|
System data partition mount point
|
|
(if the system drive was partitioned for user data,
|
|
provide the mount point, e.g., /mnt/sys_drive)
|
|
System data path [/mnt/sys_drive]: _
|
|
|
|
--- Dashboard password ---
|
|
Set a password for the controller dashboard?
|
|
(leave empty for first-visit setup prompt)
|
|
Dashboard password []: _
|
|
|
|
--- Git sync ---
|
|
App catalog repository URL [https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git]: _
|
|
Git username []: _
|
|
Git token []: _
|
|
|
|
--- Healthcheck monitoring ---
|
|
Healthchecks.io ping UUIDs (leave empty to skip):
|
|
Heartbeat UUID []: _
|
|
System health UUID []: _
|
|
DB dump UUID []: _
|
|
Backup UUID []: _
|
|
Backup integrity UUID []: _
|
|
|
|
--- Ready ---
|
|
```
|
|
|
|
**Password hashing:** If user provides a dashboard password, hash it with bcrypt.
|
|
Use `htpasswd -bnBC 10 "" "PASSWORD" | tr -d ':'` or the `python3 -c` fallback.
|
|
Store the hash in `web.password_hash`.
|
|
|
|
**Session secret:** Auto-generate: `openssl rand -hex 32`
|
|
|
|
**Hub config:** Always enabled, with the hardcoded API key:
|
|
```yaml
|
|
hub:
|
|
enabled: true
|
|
url: "https://hub.felhom.eu"
|
|
api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8"
|
|
push_interval: "15m"
|
|
```
|
|
|
|
**Backup:** Keep `enabled: true` — the user confirmed it should stay for
|
|
troubleshooting purposes.
|
|
|
|
**hdd_path:** Do NOT include in generated config. It's deprecated. Remove it
|
|
from the template entirely.
|
|
|
|
**Full template** — write this to `/opt/docker/felhom-controller/controller.yaml`:
|
|
|
|
```yaml
|
|
# Felhom Controller Configuration
|
|
# Generated by docker-setup.sh v5.0.0 on <DATE>
|
|
|
|
customer:
|
|
id: "<CUSTOMER_ID>"
|
|
name: "<CUSTOMER_NAME>"
|
|
domain: "<DOMAIN>"
|
|
email: "<EMAIL>"
|
|
telegram_chat_id: ""
|
|
|
|
infrastructure:
|
|
cf_tunnel_token: "<CF_TUNNEL_TOKEN>"
|
|
cf_api_token: "<CF_API_TOKEN>"
|
|
|
|
paths:
|
|
stacks_dir: "/opt/docker/stacks"
|
|
data_dir: "/opt/docker/felhom-controller/data"
|
|
system_data_path: "<SYSTEM_DATA_PATH>"
|
|
|
|
system:
|
|
reserved_memory_mb: 384
|
|
|
|
web:
|
|
listen: ":8080"
|
|
password_hash: "<BCRYPT_HASH_OR_EMPTY>"
|
|
session_secret: "<AUTO_GENERATED_HEX>"
|
|
|
|
git:
|
|
repo_url: "<GIT_REPO_URL>"
|
|
branch: "main"
|
|
sync_interval: "15m"
|
|
username: "<GIT_USERNAME>"
|
|
token: "<GIT_TOKEN>"
|
|
|
|
stacks:
|
|
protected:
|
|
- "traefik"
|
|
- "cloudflared"
|
|
- "felhom-controller"
|
|
- "filebrowser"
|
|
update_window: "03:00-05:00"
|
|
compose_command: ""
|
|
|
|
backup:
|
|
enabled: true
|
|
restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
|
|
db_dump_schedule: "02:30"
|
|
restic_schedule: "03:00"
|
|
retention:
|
|
keep_daily: 7
|
|
keep_weekly: 4
|
|
keep_monthly: 6
|
|
prune_schedule: "weekly"
|
|
|
|
monitoring:
|
|
enabled: true
|
|
healthchecks_base: "https://status.felhom.eu"
|
|
ping_uuids:
|
|
heartbeat: "<HEARTBEAT_UUID>"
|
|
system_health: "<SYSTEM_HEALTH_UUID>"
|
|
db_dump: "<DB_DUMP_UUID>"
|
|
backup: "<BACKUP_UUID>"
|
|
backup_integrity: "<BACKUP_INTEGRITY_UUID>"
|
|
system_health_interval: "5m"
|
|
health_check_schedule: "06:00"
|
|
thresholds:
|
|
disk_warn_percent: 80
|
|
disk_crit_percent: 90
|
|
backup_max_age_hours: 36
|
|
cpu_warn_percent: 90
|
|
memory_warn_percent: 85
|
|
temperature_warn_celsius: 75
|
|
|
|
hub:
|
|
enabled: true
|
|
url: "https://hub.felhom.eu"
|
|
api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8"
|
|
push_interval: "15m"
|
|
|
|
self_update:
|
|
enabled: true
|
|
check_interval: "6h"
|
|
image: "gitea.dooplex.hu/admin/felhom-controller"
|
|
auto_update: false
|
|
health_timeout_seconds: 60
|
|
|
|
notifications:
|
|
customer_events:
|
|
- "disk_warning"
|
|
- "backup_failed"
|
|
- "update_available"
|
|
- "security_update"
|
|
operator_events:
|
|
- "disk_critical"
|
|
- "backup_failed"
|
|
- "self_update_failed"
|
|
- "container_unhealthy"
|
|
|
|
logging:
|
|
level: "info"
|
|
file: ""
|
|
max_size_mb: 10
|
|
max_files: 3
|
|
|
|
assets:
|
|
source_url: "https://felhom.eu"
|
|
```
|
|
|
|
### 8. Update `controller.yaml.example`
|
|
|
|
Update `controller/configs/controller.yaml.example` to match the wizard template:
|
|
- **Remove** `hdd_path` line entirely
|
|
- **Set** `hub.enabled: true` (was `false`)
|
|
- **Set** `hub.api_key` to the real key: `094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8`
|
|
- **Improve** `system_data_path` comment to be clearer:
|
|
```yaml
|
|
system_data_path: "/mnt/sys_drive" # Mount point of user-data partition on system drive (e.g., /mnt/sys_drive)
|
|
```
|
|
|
|
### 9. Update `install_cloudflare_tunnel()`
|
|
|
|
The function currently reads from `CF_TUNNEL_TOKEN` (CLI arg). Change it to
|
|
read from the wizard variable (same variable name is fine, just populated by the
|
|
wizard instead of CLI). The function body stays the same — it creates the
|
|
docker-compose at `/opt/docker/cloudflared/` and starts it.
|
|
|
|
**Guard:** If wizard left the CF tunnel token empty, skip this step (already
|
|
handled by the existing `if [[ -z "$CF_TUNNEL_TOKEN" ]]` check).
|
|
|
|
### 10. Update execution order in `main()`
|
|
|
|
New execution order:
|
|
|
|
```
|
|
1. Install base packages
|
|
2. Configure network (static IP, if requested)
|
|
3. Install Docker Engine + Compose
|
|
4. Install Traefik reverse proxy
|
|
5. Generate self-signed certificate (if requested)
|
|
6. Run configuration wizard → generates controller.yaml
|
|
7. Install Cloudflare Tunnel (if token provided in wizard)
|
|
8. Install FileBrowser (protected stack)
|
|
9. Deploy felhom-controller
|
|
10. Install helper tools
|
|
11. Print summary
|
|
```
|
|
|
|
Update step numbering and `get_total_steps()` accordingly.
|
|
|
|
### 11. Update `print_summary()`
|
|
|
|
Update the summary to reflect:
|
|
- Controller is deployed and accessible at `https://felhom.<DOMAIN>`
|
|
- FileBrowser at `https://files.<DOMAIN>`
|
|
- Remove manual "deploy felhom-controller" instructions (it's automated now)
|
|
- Show healthcheck UUID status (configured / not configured)
|
|
- Show hub status (enabled)
|
|
- Remove the `CUSTOMER_ID` display bug (the "Note: No --customer specified"
|
|
message is inside the `if [[ -n "$CUSTOMER_ID" ]]` block — wrong logic)
|
|
- Add DR/reinstallation note:
|
|
```
|
|
If this is a reinstallation, the controller will automatically:
|
|
1. Contact the Hub for your previous configuration
|
|
2. Mount your existing storage drives
|
|
3. Detect and restore your applications
|
|
|
|
Open https://felhom.<DOMAIN> to monitor the restore process.
|
|
```
|
|
|
|
### 12. Update `print_help()`
|
|
|
|
Update help text to reflect:
|
|
- Removed `--cf-tunnel-token` (now in wizard)
|
|
- Removed `--hdd-path` (deprecated)
|
|
- Mention the interactive wizard
|
|
- Updated "WHAT THIS SCRIPT INSTALLS" list:
|
|
1. Base packages
|
|
2. Docker Engine + Compose
|
|
3. Traefik reverse proxy
|
|
4. TLS certificates
|
|
5. Felhom Controller (with interactive configuration)
|
|
6. FileBrowser Quantum (web file manager)
|
|
7. Cloudflare Tunnel (if configured)
|
|
8. Helper tools
|
|
|
|
---
|
|
|
|
## Additional observations
|
|
|
|
### Bugs in current script
|
|
|
|
1. **`print_summary()` CUSTOMER_ID logic is inverted** (line ~1507):
|
|
The "Note: No --customer specified" message is inside `if [[ -n "$CUSTOMER_ID" ]]`
|
|
which only triggers when a customer IS specified. Should be in an else branch
|
|
or removed.
|
|
|
|
2. **Step numbering is fragile**: The `get_total_steps()` and hardcoded step
|
|
numbers (e.g., `log_step "3/$(get_total_steps)"`) will desync if steps are
|
|
added/removed. Consider using a counter variable incremented at each step.
|
|
|
|
### Things NOT to change
|
|
|
|
- `bootstrap_sudo()` — works fine, keep as-is
|
|
- Network configuration (steps 2) — keep all network manager detection logic
|
|
- Docker installation (step 3) — keep as-is
|
|
- Traefik installation (step 4) — keep as-is
|
|
- Self-signed cert generation — keep as-is
|
|
- Helper tools installation — keep as-is
|
|
- Error trap and diagnostics — keep as-is
|
|
- Color/logging functions — keep as-is
|
|
|
|
### Template completeness check
|
|
|
|
The controller.yaml template covers all sections from the current example.
|
|
Sections that use sensible defaults and don't need wizard prompts:
|
|
- `system.reserved_memory_mb` (384)
|
|
- `backup.*` (all defaults are fine)
|
|
- `stacks.protected` (hardcoded list)
|
|
- `stacks.update_window` ("03:00-05:00")
|
|
- `monitoring.thresholds.*` (all defaults)
|
|
- `self_update.*` (all defaults)
|
|
- `notifications.*` (all defaults)
|
|
- `logging.*` (all defaults)
|
|
- `assets.*` (hardcoded)
|
|
|
|
---
|
|
|
|
## Implementation notes
|
|
|
|
- The script is bash — no external YAML parser needed. Use `cat > file << EOF`
|
|
with variable substitution for generating YAML.
|
|
- For bcrypt hashing, prefer `htpasswd -bnBC 10 "" "$password" | tr -d ':\n'`
|
|
(apache2-utils is installed in step 1). Fallback: `python3 -c "import bcrypt; ..."`
|
|
- The wizard should show current/default values in brackets and accept Enter
|
|
for defaults: `read -p "Domain [$default]: " input; value="${input:-$default}"`
|
|
- Dry-run mode should show what the wizard WOULD generate without writing files.
|
|
- All generated files should have appropriate permissions:
|
|
- `controller.yaml`: `chmod 600` (contains secrets)
|
|
- `docker-compose.yml` files: `chmod 644`
|
|
|
|
---
|
|
|
|
## Build & test
|
|
|
|
After implementing, test the script with `--dry-run` to verify:
|
|
```bash
|
|
sudo ./docker-setup.sh --domain test.local --customer test --dry-run
|
|
```
|
|
|
|
For a real deployment test on the demo node:
|
|
```bash
|
|
# Copy script to demo node
|
|
SSH=/c/Windows/System32/OpenSSH/ssh.exe
|
|
scp scripts/docker-setup.sh kisfenyo@192.168.0.162:/tmp/
|
|
|
|
# Run on demo node (it already has infrastructure, so most steps will skip)
|
|
$SSH kisfenyo@192.168.0.162 "sudo bash /tmp/docker-setup.sh --domain demo-felhom.eu --customer demo-felhom --email certs@felhom.eu --cf-token <token>"
|
|
```
|