Files
deploy-felhom-compose/TASK.md
T

16 KiB

TASK: Major rewrite of scripts/docker-setup.sh (v5.0)

Overview

Rewrite docker-setup.sh to bring it up to date with the current Felhom architecture. The script should now be a complete end-to-end provisioning tool: install infrastructure, run an interactive configuration wizard, generate controller.yaml, deploy FileBrowser as a protected stack, and deploy felhom-controller — all in one run.

Read the entire current scripts/docker-setup.sh before starting. This is a rewrite of an existing ~1600-line script, not a new file.


Changes Required

1. Update banner and version

  • Set SCRIPT_VERSION="5.0.0"
  • Update print_banner() — no Portainer, the title should be:
    Felhom Infrastructure Setup v5.0.0
    
  • Update the comment header block at the top of the file to match the new scope (Docker + Traefik + FileBrowser + Controller + configuration wizard).
  • Update print_help() to reflect all removed/changed options.

2. Remove Portainer (confirm clean)

The current script has no Portainer code (already removed in a prior version). Just make sure there are zero references to "portainer" or "Portainer" anywhere — banner, comments, help text, variables. Search and confirm.

3. Remove --cf-tunnel-token CLI option

Remove the --cf-tunnel-token CLI flag and the CF_TUNNEL_TOKEN variable from parse_args(). The Cloudflare tunnel token is now collected by the configuration wizard and written into controller.yaml (see §7 below). The install_cloudflare_tunnel() function stays but reads the token from the wizard variable instead of a CLI flag.

Also remove --hdd-path CLI option and HDD_PATH variable — deprecated.

Keep these CLI options (still useful for non-interactive/scripted runs):

  • --ip, --gateway, --dns, --interface (network config)
  • --domain, --email, --cf-token (TLS/domain — can pre-seed wizard)
  • --customer (customer ID — can pre-seed wizard)
  • --traefik-password, --self-signed-cert
  • --skip-filebrowser
  • --dry-run, --debug, --help, --bootstrap

4. Remove --hdd-path references

Remove HDD_PATH variable, --hdd-path argument parsing, and all references. FileBrowser mounts are determined by the wizard (system_data_path and any existing /mnt/* mounts).

5. FileBrowser deployment as protected stack

The current install_filebrowser() function needs to be rewritten:

Location: Deploy to /opt/docker/stacks/filebrowser/ (already the current FILEBROWSER_DIR — keep this).

Compose file: Generate a compose file matching the current production layout on the demo node. Key differences from current script template:

services:
  filebrowser:
    image: gtstef/filebrowser:latest
    container_name: filebrowser
    restart: unless-stopped
    environment:
      - TZ=Europe/Budapest
    volumes:
      - filebrowser_data:/home/filebrowser/data
      # Mount discovered drives — populated by wizard
      # e.g. /mnt/hdd_1:/srv/hdd_1, /mnt/sys_drive:/srv/sys_drive
    networks:
      - traefik-public
    deploy:
      resources:
        limits:
          memory: 256M
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:80/"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 15s
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.filebrowser.rule=Host(`files.<DOMAIN>`)"
      - "traefik.http.routers.filebrowser.entrypoints=websecure"
      - "traefik.http.routers.filebrowser.tls=true"
      - "traefik.http.services.filebrowser.loadbalancer.server.port=80"
      - "traefik.docker.network=traefik-public"

Drive discovery for volumes: The wizard (§7) collects system_data_path. Additionally, scan /mnt/ for existing mount points at install time. For each discovered mount (e.g., /mnt/hdd_1, /mnt/sys_drive), add a volume mapping: /mnt/<name>:/srv/<name>. If no mounts found, only mount the system_data_path.

Hardcode domain in the Traefik host rule (no ${DOMAIN} env var needed). Use the wizard's domain value directly: Host(\files.ACTUAL-DOMAIN`)`.

Also generate .felhom.yml metadata file — keep the existing one from the current script (Hungarian text, category: storage, etc.).

No .env file needed for filebrowser (domain is hardcoded in compose labels).

6. Controller deployment (NEW step)

Add a new step to deploy felhom-controller. This is currently missing from the script — the user had to deploy it manually.

Location: /opt/docker/felhom-controller/

docker-compose.yml — generate matching the current production layout:

services:
  felhom-controller:
    image: gitea.dooplex.hu/admin/felhom-controller:latest
    container_name: felhom-controller
    restart: unless-stopped
    privileged: true
    ports:
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /opt/docker/felhom-controller/controller.yaml:/opt/docker/felhom-controller/controller.yaml:ro
      - controller-data:/opt/docker/felhom-controller/data
      - /opt/docker/stacks:/opt/docker/stacks
      - /srv/backups:/srv/backups
      - type: bind
        source: /mnt
        target: /mnt
        bind:
          propagation: rshared
      - /sys:/host/sys:ro
      - /etc/os-release:/host/etc/os-release:ro
      - /etc/hostname:/host/etc/hostname:ro
      - /dev:/host-dev:rw
      - /etc/fstab:/host-fstab
      - /run/udev:/run/udev:ro
    environment:
      - TZ=Europe/Budapest
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.controller.rule=Host(`felhom.<DOMAIN>`)"
      - "traefik.http.routers.controller.entrypoints=websecure"
      - "traefik.http.routers.controller.tls=true"
      - "traefik.http.services.controller.loadbalancer.server.port=8080"
      - "traefik.docker.network=traefik-public"
      - "felhom.managed=true"
      - "felhom.component=controller"
    networks:
      - traefik-public
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/api/health"]
      interval: 30s
      timeout: 5s
      start_period: 10s
      retries: 3

volumes:
  controller-data:

networks:
  traefik-public:
    external: true

Hardcode domain in Traefik labels (like filebrowser).

Generate .env with just DOMAIN=<domain> — needed only as a reference/ documentation, since we hardcode the domain in compose labels. Actually, skip the .env file entirely — compose doesn't need it if labels are hardcoded.

Use latest tag for the image. The controller has self-update capability so it will manage its own version after initial deployment.

Pull and start the controller, then verify health via the healthcheck endpoint.

7. Configuration wizard for controller.yaml

Add an interactive wizard function run_config_wizard() that runs AFTER infrastructure setup but BEFORE deploying the controller. It generates /opt/docker/felhom-controller/controller.yaml.

CLI pre-seeding: If --domain, --customer, --email, --cf-token are provided via CLI, use them as defaults in the wizard (user can still change).

Wizard flow (each question is a read -p prompt with a default shown in brackets):

===========================================================
  Felhom Controller Configuration Wizard
===========================================================

--- Customer identity ---
Customer ID [demo-felhom]: _
Customer display name [Demo Ügyfél]: _
Domain [homeserver.local]: _
Customer email (optional) []: _

--- Infrastructure secrets ---
Cloudflare Tunnel token (optional, leave empty to skip) []: _
Cloudflare API token (for DNS-01 certs, optional) []: _

--- Paths ---
System data partition mount point
  (if the system drive was partitioned for user data,
   provide the mount point, e.g., /mnt/sys_drive)
System data path [/mnt/sys_drive]: _

--- Dashboard password ---
Set a password for the controller dashboard?
  (leave empty for first-visit setup prompt)
Dashboard password []: _

--- Git sync ---
App catalog repository URL [https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git]: _
Git username []: _
Git token []: _

--- Healthcheck monitoring ---
Healthchecks.io ping UUIDs (leave empty to skip):
  Heartbeat UUID []: _
  System health UUID []: _
  DB dump UUID []: _
  Backup UUID []: _
  Backup integrity UUID []: _

--- Ready ---

Password hashing: If user provides a dashboard password, hash it with bcrypt. Use htpasswd -bnBC 10 "" "PASSWORD" | tr -d ':' or the python3 -c fallback. Store the hash in web.password_hash.

Session secret: Auto-generate: openssl rand -hex 32

Hub config: Always enabled, with the hardcoded API key:

hub:
  enabled: true
  url: "https://hub.felhom.eu"
  api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8"
  push_interval: "15m"

Backup: Keep enabled: true — the user confirmed it should stay for troubleshooting purposes.

hdd_path: Do NOT include in generated config. It's deprecated. Remove it from the template entirely.

Full template — write this to /opt/docker/felhom-controller/controller.yaml:

# Felhom Controller Configuration
# Generated by docker-setup.sh v5.0.0 on <DATE>

customer:
  id: "<CUSTOMER_ID>"
  name: "<CUSTOMER_NAME>"
  domain: "<DOMAIN>"
  email: "<EMAIL>"
  telegram_chat_id: ""

infrastructure:
  cf_tunnel_token: "<CF_TUNNEL_TOKEN>"
  cf_api_token: "<CF_API_TOKEN>"

paths:
  stacks_dir: "/opt/docker/stacks"
  data_dir: "/opt/docker/felhom-controller/data"
  system_data_path: "<SYSTEM_DATA_PATH>"

system:
  reserved_memory_mb: 384

web:
  listen: ":8080"
  password_hash: "<BCRYPT_HASH_OR_EMPTY>"
  session_secret: "<AUTO_GENERATED_HEX>"

git:
  repo_url: "<GIT_REPO_URL>"
  branch: "main"
  sync_interval: "15m"
  username: "<GIT_USERNAME>"
  token: "<GIT_TOKEN>"

stacks:
  protected:
    - "traefik"
    - "cloudflared"
    - "felhom-controller"
    - "filebrowser"
  update_window: "03:00-05:00"
  compose_command: ""

backup:
  enabled: true
  restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
  db_dump_schedule: "02:30"
  restic_schedule: "03:00"
  retention:
    keep_daily: 7
    keep_weekly: 4
    keep_monthly: 6
  prune_schedule: "weekly"

monitoring:
  enabled: true
  healthchecks_base: "https://status.felhom.eu"
  ping_uuids:
    heartbeat: "<HEARTBEAT_UUID>"
    system_health: "<SYSTEM_HEALTH_UUID>"
    db_dump: "<DB_DUMP_UUID>"
    backup: "<BACKUP_UUID>"
    backup_integrity: "<BACKUP_INTEGRITY_UUID>"
  system_health_interval: "5m"
  health_check_schedule: "06:00"
  thresholds:
    disk_warn_percent: 80
    disk_crit_percent: 90
    backup_max_age_hours: 36
    cpu_warn_percent: 90
    memory_warn_percent: 85
    temperature_warn_celsius: 75

hub:
  enabled: true
  url: "https://hub.felhom.eu"
  api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8"
  push_interval: "15m"

self_update:
  enabled: true
  check_interval: "6h"
  image: "gitea.dooplex.hu/admin/felhom-controller"
  auto_update: false
  health_timeout_seconds: 60

notifications:
  customer_events:
    - "disk_warning"
    - "backup_failed"
    - "update_available"
    - "security_update"
  operator_events:
    - "disk_critical"
    - "backup_failed"
    - "self_update_failed"
    - "container_unhealthy"

logging:
  level: "info"
  file: ""
  max_size_mb: 10
  max_files: 3

assets:
  source_url: "https://felhom.eu"

8. Update controller.yaml.example

Update controller/configs/controller.yaml.example to match the wizard template:

  • Remove hdd_path line entirely
  • Set hub.enabled: true (was false)
  • Set hub.api_key to the real key: 094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8
  • Improve system_data_path comment to be clearer:
    system_data_path: "/mnt/sys_drive"   # Mount point of user-data partition on system drive (e.g., /mnt/sys_drive)
    

9. Update install_cloudflare_tunnel()

The function currently reads from CF_TUNNEL_TOKEN (CLI arg). Change it to read from the wizard variable (same variable name is fine, just populated by the wizard instead of CLI). The function body stays the same — it creates the docker-compose at /opt/docker/cloudflared/ and starts it.

Guard: If wizard left the CF tunnel token empty, skip this step (already handled by the existing if [[ -z "$CF_TUNNEL_TOKEN" ]] check).

10. Update execution order in main()

New execution order:

1. Install base packages
2. Configure network (static IP, if requested)
3. Install Docker Engine + Compose
4. Install Traefik reverse proxy
5. Generate self-signed certificate (if requested)
6. Run configuration wizard → generates controller.yaml
7. Install Cloudflare Tunnel (if token provided in wizard)
8. Install FileBrowser (protected stack)
9. Deploy felhom-controller
10. Install helper tools
11. Print summary

Update step numbering and get_total_steps() accordingly.

11. Update print_summary()

Update the summary to reflect:

  • Controller is deployed and accessible at https://felhom.<DOMAIN>
  • FileBrowser at https://files.<DOMAIN>
  • Remove manual "deploy felhom-controller" instructions (it's automated now)
  • Show healthcheck UUID status (configured / not configured)
  • Show hub status (enabled)
  • Remove the CUSTOMER_ID display bug (the "Note: No --customer specified" message is inside the if [[ -n "$CUSTOMER_ID" ]] block — wrong logic)

12. Update print_help()

Update help text to reflect:

  • Removed --cf-tunnel-token (now in wizard)
  • Removed --hdd-path (deprecated)
  • Mention the interactive wizard
  • Updated "WHAT THIS SCRIPT INSTALLS" list:
    1. Base packages
    2. Docker Engine + Compose
    3. Traefik reverse proxy
    4. TLS certificates
    5. Felhom Controller (with interactive configuration)
    6. FileBrowser Quantum (web file manager)
    7. Cloudflare Tunnel (if configured)
    8. Helper tools

Additional observations

Bugs in current script

  1. print_summary() CUSTOMER_ID logic is inverted (line ~1507): The "Note: No --customer specified" message is inside if [[ -n "$CUSTOMER_ID" ]] which only triggers when a customer IS specified. Should be in an else branch or removed.

  2. Step numbering is fragile: The get_total_steps() and hardcoded step numbers (e.g., log_step "3/$(get_total_steps)") will desync if steps are added/removed. Consider using a counter variable incremented at each step.

Things NOT to change

  • bootstrap_sudo() — works fine, keep as-is
  • Network configuration (steps 2) — keep all network manager detection logic
  • Docker installation (step 3) — keep as-is
  • Traefik installation (step 4) — keep as-is
  • Self-signed cert generation — keep as-is
  • Helper tools installation — keep as-is
  • Error trap and diagnostics — keep as-is
  • Color/logging functions — keep as-is

Template completeness check

The controller.yaml template covers all sections from the current example. Sections that use sensible defaults and don't need wizard prompts:

  • system.reserved_memory_mb (384)
  • backup.* (all defaults are fine)
  • stacks.protected (hardcoded list)
  • stacks.update_window ("03:00-05:00")
  • monitoring.thresholds.* (all defaults)
  • self_update.* (all defaults)
  • notifications.* (all defaults)
  • logging.* (all defaults)
  • assets.* (hardcoded)

Implementation notes

  • The script is bash — no external YAML parser needed. Use cat > file << EOF with variable substitution for generating YAML.
  • For bcrypt hashing, prefer htpasswd -bnBC 10 "" "$password" | tr -d ':\n' (apache2-utils is installed in step 1). Fallback: python3 -c "import bcrypt; ..."
  • The wizard should show current/default values in brackets and accept Enter for defaults: read -p "Domain [$default]: " input; value="${input:-$default}"
  • Dry-run mode should show what the wizard WOULD generate without writing files.
  • All generated files should have appropriate permissions:
    • controller.yaml: chmod 600 (contains secrets)
    • docker-compose.yml files: chmod 644

Build & test

After implementing, test the script with --dry-run to verify:

sudo ./docker-setup.sh --domain test.local --customer test --dry-run

For a real deployment test on the demo node:

# Copy script to demo node
SSH=/c/Windows/System32/OpenSSH/ssh.exe
scp scripts/docker-setup.sh kisfenyo@192.168.0.162:/tmp/

# Run on demo node (it already has infrastructure, so most steps will skip)
$SSH kisfenyo@192.168.0.162 "sudo bash /tmp/docker-setup.sh --domain demo-felhom.eu --customer demo-felhom --email certs@felhom.eu --cf-token <token>"