# TASK: Major rewrite of `scripts/docker-setup.sh` (v5.0) ## Overview Rewrite `docker-setup.sh` to bring it up to date with the current Felhom architecture. The script should now be a complete end-to-end provisioning tool: install infrastructure, run an interactive configuration wizard, generate `controller.yaml`, deploy FileBrowser as a protected stack, and deploy felhom-controller — all in one run. **Read the entire current `scripts/docker-setup.sh` before starting. This is a rewrite of an existing ~1600-line script, not a new file.** --- ## Changes Required ### 1. Update banner and version - Set `SCRIPT_VERSION="5.0.0"` - Update `print_banner()` — no Portainer, the title should be: ``` Felhom Infrastructure Setup v5.0.0 ``` - Update the comment header block at the top of the file to match the new scope (Docker + Traefik + FileBrowser + Controller + configuration wizard). - Update `print_help()` to reflect all removed/changed options. ### 2. Remove Portainer (confirm clean) The current script has no Portainer code (already removed in a prior version). Just make sure there are zero references to "portainer" or "Portainer" anywhere — banner, comments, help text, variables. Search and confirm. ### 3. Remove `--cf-tunnel-token` CLI option **Remove** the `--cf-tunnel-token` CLI flag and the `CF_TUNNEL_TOKEN` variable from `parse_args()`. The Cloudflare tunnel token is now collected by the configuration wizard and written into `controller.yaml` (see §7 below). The `install_cloudflare_tunnel()` function stays but reads the token from the wizard variable instead of a CLI flag. Also remove `--hdd-path` CLI option and `HDD_PATH` variable — deprecated. Keep these CLI options (still useful for non-interactive/scripted runs): - `--ip`, `--gateway`, `--dns`, `--interface` (network config) - `--domain`, `--email`, `--cf-token` (TLS/domain — can pre-seed wizard) - `--customer` (customer ID — can pre-seed wizard) - `--traefik-password`, `--self-signed-cert` - `--skip-filebrowser` - `--dry-run`, `--debug`, `--help`, `--bootstrap` ### 4. Remove `--hdd-path` references Remove `HDD_PATH` variable, `--hdd-path` argument parsing, and all references. FileBrowser mounts are determined by the wizard (system_data_path and any existing `/mnt/*` mounts). ### 5. FileBrowser deployment as protected stack The current `install_filebrowser()` function needs to be rewritten: **Location:** Deploy to `/opt/docker/stacks/filebrowser/` (already the current `FILEBROWSER_DIR` — keep this). **Compose file:** Generate a compose file matching the current production layout on the demo node. Key differences from current script template: ```yaml services: filebrowser: image: gtstef/filebrowser:latest container_name: filebrowser restart: unless-stopped environment: - TZ=Europe/Budapest volumes: - filebrowser_data:/home/filebrowser/data # Mount discovered drives — populated by wizard # e.g. /mnt/hdd_1:/srv/hdd_1, /mnt/sys_drive:/srv/sys_drive networks: - traefik-public deploy: resources: limits: memory: 256M healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:80/"] interval: 30s timeout: 5s retries: 3 start_period: 15s labels: - "traefik.enable=true" - "traefik.http.routers.filebrowser.rule=Host(`files.`)" - "traefik.http.routers.filebrowser.entrypoints=websecure" - "traefik.http.routers.filebrowser.tls=true" - "traefik.http.services.filebrowser.loadbalancer.server.port=80" - "traefik.docker.network=traefik-public" ``` **Drive discovery for volumes:** The wizard (§7) collects `system_data_path`. Additionally, scan `/mnt/` for existing mount points at install time. For each discovered mount (e.g., `/mnt/hdd_1`, `/mnt/sys_drive`), add a volume mapping: `/mnt/:/srv/`. If no mounts found, only mount the `system_data_path`. **Hardcode domain** in the Traefik host rule (no `${DOMAIN}` env var needed). Use the wizard's domain value directly: `Host(\`files.ACTUAL-DOMAIN\`)`. **Also generate `.felhom.yml`** metadata file — keep the existing one from the current script (Hungarian text, category: storage, etc.). **No `.env` file needed** for filebrowser (domain is hardcoded in compose labels). ### 6. Controller deployment (NEW step) Add a new step to deploy felhom-controller. This is currently missing from the script — the user had to deploy it manually. **Location:** `/opt/docker/felhom-controller/` **docker-compose.yml** — generate matching the current production layout: ```yaml services: felhom-controller: image: gitea.dooplex.hu/admin/felhom-controller:latest container_name: felhom-controller restart: unless-stopped privileged: true ports: - "8080:8080" volumes: - /var/run/docker.sock:/var/run/docker.sock - /opt/docker/felhom-controller/controller.yaml:/opt/docker/felhom-controller/controller.yaml:ro - controller-data:/opt/docker/felhom-controller/data - /opt/docker/stacks:/opt/docker/stacks - /srv/backups:/srv/backups - type: bind source: /mnt target: /mnt bind: propagation: rshared - /sys:/host/sys:ro - /etc/os-release:/host/etc/os-release:ro - /etc/hostname:/host/etc/hostname:ro - /dev:/host-dev:rw - /etc/fstab:/host-fstab - /run/udev:/run/udev:ro environment: - TZ=Europe/Budapest labels: - "traefik.enable=true" - "traefik.http.routers.controller.rule=Host(`felhom.`)" - "traefik.http.routers.controller.entrypoints=websecure" - "traefik.http.routers.controller.tls=true" - "traefik.http.services.controller.loadbalancer.server.port=8080" - "traefik.docker.network=traefik-public" - "felhom.managed=true" - "felhom.component=controller" networks: - traefik-public healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/api/health"] interval: 30s timeout: 5s start_period: 10s retries: 3 volumes: controller-data: networks: traefik-public: external: true ``` **Hardcode domain** in Traefik labels (like filebrowser). **Generate `.env`** with just `DOMAIN=` — needed only as a reference/ documentation, since we hardcode the domain in compose labels. Actually, skip the `.env` file entirely — compose doesn't need it if labels are hardcoded. **Use `latest` tag** for the image. The controller has self-update capability so it will manage its own version after initial deployment. **Pull and start** the controller, then verify health via the healthcheck endpoint. ### 7. Configuration wizard for `controller.yaml` Add an interactive wizard function `run_config_wizard()` that runs AFTER infrastructure setup but BEFORE deploying the controller. It generates `/opt/docker/felhom-controller/controller.yaml`. **CLI pre-seeding:** If `--domain`, `--customer`, `--email`, `--cf-token` are provided via CLI, use them as defaults in the wizard (user can still change). **Wizard flow** (each question is a `read -p` prompt with a default shown in brackets): ``` =========================================================== Felhom Controller Configuration Wizard =========================================================== --- Customer identity --- Customer ID [demo-felhom]: _ Customer display name [Demo Ügyfél]: _ Domain [homeserver.local]: _ Customer email (optional) []: _ --- Infrastructure secrets --- Cloudflare Tunnel token (optional, leave empty to skip) []: _ Cloudflare API token (for DNS-01 certs, optional) []: _ --- Paths --- System data partition mount point (if the system drive was partitioned for user data, provide the mount point, e.g., /mnt/sys_drive) System data path [/mnt/sys_drive]: _ --- Dashboard password --- Set a password for the controller dashboard? (leave empty for first-visit setup prompt) Dashboard password []: _ --- Git sync --- App catalog repository URL [https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git]: _ Git username []: _ Git token []: _ --- Healthcheck monitoring --- Healthchecks.io ping UUIDs (leave empty to skip): Heartbeat UUID []: _ System health UUID []: _ DB dump UUID []: _ Backup UUID []: _ Backup integrity UUID []: _ --- Ready --- ``` **Password hashing:** If user provides a dashboard password, hash it with bcrypt. Use `htpasswd -bnBC 10 "" "PASSWORD" | tr -d ':'` or the `python3 -c` fallback. Store the hash in `web.password_hash`. **Session secret:** Auto-generate: `openssl rand -hex 32` **Hub config:** Always enabled, with the hardcoded API key: ```yaml hub: enabled: true url: "https://hub.felhom.eu" api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8" push_interval: "15m" ``` **Backup:** Keep `enabled: true` — the user confirmed it should stay for troubleshooting purposes. **hdd_path:** Do NOT include in generated config. It's deprecated. Remove it from the template entirely. **Full template** — write this to `/opt/docker/felhom-controller/controller.yaml`: ```yaml # Felhom Controller Configuration # Generated by docker-setup.sh v5.0.0 on customer: id: "" name: "" domain: "" email: "" telegram_chat_id: "" infrastructure: cf_tunnel_token: "" cf_api_token: "" paths: stacks_dir: "/opt/docker/stacks" data_dir: "/opt/docker/felhom-controller/data" system_data_path: "" system: reserved_memory_mb: 384 web: listen: ":8080" password_hash: "" session_secret: "" git: repo_url: "" branch: "main" sync_interval: "15m" username: "" token: "" stacks: protected: - "traefik" - "cloudflared" - "felhom-controller" - "filebrowser" update_window: "03:00-05:00" compose_command: "" backup: enabled: true restic_password_file: "/opt/docker/felhom-controller/data/restic-password" db_dump_schedule: "02:30" restic_schedule: "03:00" retention: keep_daily: 7 keep_weekly: 4 keep_monthly: 6 prune_schedule: "weekly" monitoring: enabled: true healthchecks_base: "https://status.felhom.eu" ping_uuids: heartbeat: "" system_health: "" db_dump: "" backup: "" backup_integrity: "" system_health_interval: "5m" health_check_schedule: "06:00" thresholds: disk_warn_percent: 80 disk_crit_percent: 90 backup_max_age_hours: 36 cpu_warn_percent: 90 memory_warn_percent: 85 temperature_warn_celsius: 75 hub: enabled: true url: "https://hub.felhom.eu" api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8" push_interval: "15m" self_update: enabled: true check_interval: "6h" image: "gitea.dooplex.hu/admin/felhom-controller" auto_update: false health_timeout_seconds: 60 notifications: customer_events: - "disk_warning" - "backup_failed" - "update_available" - "security_update" operator_events: - "disk_critical" - "backup_failed" - "self_update_failed" - "container_unhealthy" logging: level: "info" file: "" max_size_mb: 10 max_files: 3 assets: source_url: "https://felhom.eu" ``` ### 8. Update `controller.yaml.example` Update `controller/configs/controller.yaml.example` to match the wizard template: - **Remove** `hdd_path` line entirely - **Set** `hub.enabled: true` (was `false`) - **Set** `hub.api_key` to the real key: `094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8` - **Improve** `system_data_path` comment to be clearer: ```yaml system_data_path: "/mnt/sys_drive" # Mount point of user-data partition on system drive (e.g., /mnt/sys_drive) ``` ### 9. Update `install_cloudflare_tunnel()` The function currently reads from `CF_TUNNEL_TOKEN` (CLI arg). Change it to read from the wizard variable (same variable name is fine, just populated by the wizard instead of CLI). The function body stays the same — it creates the docker-compose at `/opt/docker/cloudflared/` and starts it. **Guard:** If wizard left the CF tunnel token empty, skip this step (already handled by the existing `if [[ -z "$CF_TUNNEL_TOKEN" ]]` check). ### 10. Update execution order in `main()` New execution order: ``` 1. Install base packages 2. Configure network (static IP, if requested) 3. Install Docker Engine + Compose 4. Install Traefik reverse proxy 5. Generate self-signed certificate (if requested) 6. Run configuration wizard → generates controller.yaml 7. Install Cloudflare Tunnel (if token provided in wizard) 8. Install FileBrowser (protected stack) 9. Deploy felhom-controller 10. Install helper tools 11. Print summary ``` Update step numbering and `get_total_steps()` accordingly. ### 11. Update `print_summary()` Update the summary to reflect: - Controller is deployed and accessible at `https://felhom.` - FileBrowser at `https://files.` - Remove manual "deploy felhom-controller" instructions (it's automated now) - Show healthcheck UUID status (configured / not configured) - Show hub status (enabled) - Remove the `CUSTOMER_ID` display bug (the "Note: No --customer specified" message is inside the `if [[ -n "$CUSTOMER_ID" ]]` block — wrong logic) ### 12. Update `print_help()` Update help text to reflect: - Removed `--cf-tunnel-token` (now in wizard) - Removed `--hdd-path` (deprecated) - Mention the interactive wizard - Updated "WHAT THIS SCRIPT INSTALLS" list: 1. Base packages 2. Docker Engine + Compose 3. Traefik reverse proxy 4. TLS certificates 5. Felhom Controller (with interactive configuration) 6. FileBrowser Quantum (web file manager) 7. Cloudflare Tunnel (if configured) 8. Helper tools --- ## Additional observations ### Bugs in current script 1. **`print_summary()` CUSTOMER_ID logic is inverted** (line ~1507): The "Note: No --customer specified" message is inside `if [[ -n "$CUSTOMER_ID" ]]` which only triggers when a customer IS specified. Should be in an else branch or removed. 2. **Step numbering is fragile**: The `get_total_steps()` and hardcoded step numbers (e.g., `log_step "3/$(get_total_steps)"`) will desync if steps are added/removed. Consider using a counter variable incremented at each step. ### Things NOT to change - `bootstrap_sudo()` — works fine, keep as-is - Network configuration (steps 2) — keep all network manager detection logic - Docker installation (step 3) — keep as-is - Traefik installation (step 4) — keep as-is - Self-signed cert generation — keep as-is - Helper tools installation — keep as-is - Error trap and diagnostics — keep as-is - Color/logging functions — keep as-is ### Template completeness check The controller.yaml template covers all sections from the current example. Sections that use sensible defaults and don't need wizard prompts: - `system.reserved_memory_mb` (384) - `backup.*` (all defaults are fine) - `stacks.protected` (hardcoded list) - `stacks.update_window` ("03:00-05:00") - `monitoring.thresholds.*` (all defaults) - `self_update.*` (all defaults) - `notifications.*` (all defaults) - `logging.*` (all defaults) - `assets.*` (hardcoded) --- ## Implementation notes - The script is bash — no external YAML parser needed. Use `cat > file << EOF` with variable substitution for generating YAML. - For bcrypt hashing, prefer `htpasswd -bnBC 10 "" "$password" | tr -d ':\n'` (apache2-utils is installed in step 1). Fallback: `python3 -c "import bcrypt; ..."` - The wizard should show current/default values in brackets and accept Enter for defaults: `read -p "Domain [$default]: " input; value="${input:-$default}"` - Dry-run mode should show what the wizard WOULD generate without writing files. - All generated files should have appropriate permissions: - `controller.yaml`: `chmod 600` (contains secrets) - `docker-compose.yml` files: `chmod 644` --- ## Build & test After implementing, test the script with `--dry-run` to verify: ```bash sudo ./docker-setup.sh --domain test.local --customer test --dry-run ``` For a real deployment test on the demo node: ```bash # Copy script to demo node SSH=/c/Windows/System32/OpenSSH/ssh.exe scp scripts/docker-setup.sh kisfenyo@192.168.0.162:/tmp/ # Run on demo node (it already has infrastructure, so most steps will skip) $SSH kisfenyo@192.168.0.162 "sudo bash /tmp/docker-setup.sh --domain demo-felhom.eu --customer demo-felhom --email certs@felhom.eu --cf-token " ```