Files
deploy-felhom-compose/TASK.md
T

513 lines
16 KiB
Markdown

# TASK: Major rewrite of `scripts/docker-setup.sh` (v5.0)
## Overview
Rewrite `docker-setup.sh` to bring it up to date with the current Felhom architecture.
The script should now be a complete end-to-end provisioning tool: install infrastructure,
run an interactive configuration wizard, generate `controller.yaml`, deploy FileBrowser
as a protected stack, and deploy felhom-controller — all in one run.
**Read the entire current `scripts/docker-setup.sh` before starting. This is a rewrite
of an existing ~1600-line script, not a new file.**
---
## Changes Required
### 1. Update banner and version
- Set `SCRIPT_VERSION="5.0.0"`
- Update `print_banner()` — no Portainer, the title should be:
```
Felhom Infrastructure Setup v5.0.0
```
- Update the comment header block at the top of the file to match the new scope
(Docker + Traefik + FileBrowser + Controller + configuration wizard).
- Update `print_help()` to reflect all removed/changed options.
### 2. Remove Portainer (confirm clean)
The current script has no Portainer code (already removed in a prior version).
Just make sure there are zero references to "portainer" or "Portainer" anywhere —
banner, comments, help text, variables. Search and confirm.
### 3. Remove `--cf-tunnel-token` CLI option
**Remove** the `--cf-tunnel-token` CLI flag and the `CF_TUNNEL_TOKEN` variable from
`parse_args()`. The Cloudflare tunnel token is now collected by the configuration wizard
and written into `controller.yaml` (see §7 below). The `install_cloudflare_tunnel()`
function stays but reads the token from the wizard variable instead of a CLI flag.
Also remove `--hdd-path` CLI option and `HDD_PATH` variable — deprecated.
Keep these CLI options (still useful for non-interactive/scripted runs):
- `--ip`, `--gateway`, `--dns`, `--interface` (network config)
- `--domain`, `--email`, `--cf-token` (TLS/domain — can pre-seed wizard)
- `--customer` (customer ID — can pre-seed wizard)
- `--traefik-password`, `--self-signed-cert`
- `--skip-filebrowser`
- `--dry-run`, `--debug`, `--help`, `--bootstrap`
### 4. Remove `--hdd-path` references
Remove `HDD_PATH` variable, `--hdd-path` argument parsing, and all references.
FileBrowser mounts are determined by the wizard (system_data_path and any existing
`/mnt/*` mounts).
### 5. FileBrowser deployment as protected stack
The current `install_filebrowser()` function needs to be rewritten:
**Location:** Deploy to `/opt/docker/stacks/filebrowser/` (already the current
`FILEBROWSER_DIR` — keep this).
**Compose file:** Generate a compose file matching the current production layout
on the demo node. Key differences from current script template:
```yaml
services:
filebrowser:
image: gtstef/filebrowser:latest
container_name: filebrowser
restart: unless-stopped
environment:
- TZ=Europe/Budapest
volumes:
- filebrowser_data:/home/filebrowser/data
# Mount discovered drives — populated by wizard
# e.g. /mnt/hdd_1:/srv/hdd_1, /mnt/sys_drive:/srv/sys_drive
networks:
- traefik-public
deploy:
resources:
limits:
memory: 256M
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:80/"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
labels:
- "traefik.enable=true"
- "traefik.http.routers.filebrowser.rule=Host(`files.<DOMAIN>`)"
- "traefik.http.routers.filebrowser.entrypoints=websecure"
- "traefik.http.routers.filebrowser.tls=true"
- "traefik.http.services.filebrowser.loadbalancer.server.port=80"
- "traefik.docker.network=traefik-public"
```
**Drive discovery for volumes:** The wizard (§7) collects `system_data_path`.
Additionally, scan `/mnt/` for existing mount points at install time. For each
discovered mount (e.g., `/mnt/hdd_1`, `/mnt/sys_drive`), add a volume mapping:
`/mnt/<name>:/srv/<name>`. If no mounts found, only mount the `system_data_path`.
**Hardcode domain** in the Traefik host rule (no `${DOMAIN}` env var needed).
Use the wizard's domain value directly: `Host(\`files.ACTUAL-DOMAIN\`)`.
**Also generate `.felhom.yml`** metadata file — keep the existing one from the
current script (Hungarian text, category: storage, etc.).
**No `.env` file needed** for filebrowser (domain is hardcoded in compose labels).
### 6. Controller deployment (NEW step)
Add a new step to deploy felhom-controller. This is currently missing from the
script — the user had to deploy it manually.
**Location:** `/opt/docker/felhom-controller/`
**docker-compose.yml** — generate matching the current production layout:
```yaml
services:
felhom-controller:
image: gitea.dooplex.hu/admin/felhom-controller:latest
container_name: felhom-controller
restart: unless-stopped
privileged: true
ports:
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/docker/felhom-controller/controller.yaml:/opt/docker/felhom-controller/controller.yaml:ro
- controller-data:/opt/docker/felhom-controller/data
- /opt/docker/stacks:/opt/docker/stacks
- /srv/backups:/srv/backups
- type: bind
source: /mnt
target: /mnt
bind:
propagation: rshared
- /sys:/host/sys:ro
- /etc/os-release:/host/etc/os-release:ro
- /etc/hostname:/host/etc/hostname:ro
- /dev:/host-dev:rw
- /etc/fstab:/host-fstab
- /run/udev:/run/udev:ro
environment:
- TZ=Europe/Budapest
labels:
- "traefik.enable=true"
- "traefik.http.routers.controller.rule=Host(`felhom.<DOMAIN>`)"
- "traefik.http.routers.controller.entrypoints=websecure"
- "traefik.http.routers.controller.tls=true"
- "traefik.http.services.controller.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "felhom.managed=true"
- "felhom.component=controller"
networks:
- traefik-public
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/api/health"]
interval: 30s
timeout: 5s
start_period: 10s
retries: 3
volumes:
controller-data:
networks:
traefik-public:
external: true
```
**Hardcode domain** in Traefik labels (like filebrowser).
**Generate `.env`** with just `DOMAIN=<domain>` — needed only as a reference/
documentation, since we hardcode the domain in compose labels. Actually, skip
the `.env` file entirely — compose doesn't need it if labels are hardcoded.
**Use `latest` tag** for the image. The controller has self-update capability
so it will manage its own version after initial deployment.
**Pull and start** the controller, then verify health via the healthcheck endpoint.
### 7. Configuration wizard for `controller.yaml`
Add an interactive wizard function `run_config_wizard()` that runs AFTER
infrastructure setup but BEFORE deploying the controller. It generates
`/opt/docker/felhom-controller/controller.yaml`.
**CLI pre-seeding:** If `--domain`, `--customer`, `--email`, `--cf-token` are
provided via CLI, use them as defaults in the wizard (user can still change).
**Wizard flow** (each question is a `read -p` prompt with a default shown in brackets):
```
===========================================================
Felhom Controller Configuration Wizard
===========================================================
--- Customer identity ---
Customer ID [demo-felhom]: _
Customer display name [Demo Ügyfél]: _
Domain [homeserver.local]: _
Customer email (optional) []: _
--- Infrastructure secrets ---
Cloudflare Tunnel token (optional, leave empty to skip) []: _
Cloudflare API token (for DNS-01 certs, optional) []: _
--- Paths ---
System data partition mount point
(if the system drive was partitioned for user data,
provide the mount point, e.g., /mnt/sys_drive)
System data path [/mnt/sys_drive]: _
--- Dashboard password ---
Set a password for the controller dashboard?
(leave empty for first-visit setup prompt)
Dashboard password []: _
--- Git sync ---
App catalog repository URL [https://gitea.dooplex.hu/admin/app-catalog-felhom.eu.git]: _
Git username []: _
Git token []: _
--- Healthcheck monitoring ---
Healthchecks.io ping UUIDs (leave empty to skip):
Heartbeat UUID []: _
System health UUID []: _
DB dump UUID []: _
Backup UUID []: _
Backup integrity UUID []: _
--- Ready ---
```
**Password hashing:** If user provides a dashboard password, hash it with bcrypt.
Use `htpasswd -bnBC 10 "" "PASSWORD" | tr -d ':'` or the `python3 -c` fallback.
Store the hash in `web.password_hash`.
**Session secret:** Auto-generate: `openssl rand -hex 32`
**Hub config:** Always enabled, with the hardcoded API key:
```yaml
hub:
enabled: true
url: "https://hub.felhom.eu"
api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8"
push_interval: "15m"
```
**Backup:** Keep `enabled: true` — the user confirmed it should stay for
troubleshooting purposes.
**hdd_path:** Do NOT include in generated config. It's deprecated. Remove it
from the template entirely.
**Full template** — write this to `/opt/docker/felhom-controller/controller.yaml`:
```yaml
# Felhom Controller Configuration
# Generated by docker-setup.sh v5.0.0 on <DATE>
customer:
id: "<CUSTOMER_ID>"
name: "<CUSTOMER_NAME>"
domain: "<DOMAIN>"
email: "<EMAIL>"
telegram_chat_id: ""
infrastructure:
cf_tunnel_token: "<CF_TUNNEL_TOKEN>"
cf_api_token: "<CF_API_TOKEN>"
paths:
stacks_dir: "/opt/docker/stacks"
data_dir: "/opt/docker/felhom-controller/data"
system_data_path: "<SYSTEM_DATA_PATH>"
system:
reserved_memory_mb: 384
web:
listen: ":8080"
password_hash: "<BCRYPT_HASH_OR_EMPTY>"
session_secret: "<AUTO_GENERATED_HEX>"
git:
repo_url: "<GIT_REPO_URL>"
branch: "main"
sync_interval: "15m"
username: "<GIT_USERNAME>"
token: "<GIT_TOKEN>"
stacks:
protected:
- "traefik"
- "cloudflared"
- "felhom-controller"
- "filebrowser"
update_window: "03:00-05:00"
compose_command: ""
backup:
enabled: true
restic_password_file: "/opt/docker/felhom-controller/data/restic-password"
db_dump_schedule: "02:30"
restic_schedule: "03:00"
retention:
keep_daily: 7
keep_weekly: 4
keep_monthly: 6
prune_schedule: "weekly"
monitoring:
enabled: true
healthchecks_base: "https://status.felhom.eu"
ping_uuids:
heartbeat: "<HEARTBEAT_UUID>"
system_health: "<SYSTEM_HEALTH_UUID>"
db_dump: "<DB_DUMP_UUID>"
backup: "<BACKUP_UUID>"
backup_integrity: "<BACKUP_INTEGRITY_UUID>"
system_health_interval: "5m"
health_check_schedule: "06:00"
thresholds:
disk_warn_percent: 80
disk_crit_percent: 90
backup_max_age_hours: 36
cpu_warn_percent: 90
memory_warn_percent: 85
temperature_warn_celsius: 75
hub:
enabled: true
url: "https://hub.felhom.eu"
api_key: "094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8"
push_interval: "15m"
self_update:
enabled: true
check_interval: "6h"
image: "gitea.dooplex.hu/admin/felhom-controller"
auto_update: false
health_timeout_seconds: 60
notifications:
customer_events:
- "disk_warning"
- "backup_failed"
- "update_available"
- "security_update"
operator_events:
- "disk_critical"
- "backup_failed"
- "self_update_failed"
- "container_unhealthy"
logging:
level: "info"
file: ""
max_size_mb: 10
max_files: 3
assets:
source_url: "https://felhom.eu"
```
### 8. Update `controller.yaml.example`
Update `controller/configs/controller.yaml.example` to match the wizard template:
- **Remove** `hdd_path` line entirely
- **Set** `hub.enabled: true` (was `false`)
- **Set** `hub.api_key` to the real key: `094091de545ce28795c47ac2158fc30750db5c24a621c49329b001ee8db57fb8`
- **Improve** `system_data_path` comment to be clearer:
```yaml
system_data_path: "/mnt/sys_drive" # Mount point of user-data partition on system drive (e.g., /mnt/sys_drive)
```
### 9. Update `install_cloudflare_tunnel()`
The function currently reads from `CF_TUNNEL_TOKEN` (CLI arg). Change it to
read from the wizard variable (same variable name is fine, just populated by the
wizard instead of CLI). The function body stays the same — it creates the
docker-compose at `/opt/docker/cloudflared/` and starts it.
**Guard:** If wizard left the CF tunnel token empty, skip this step (already
handled by the existing `if [[ -z "$CF_TUNNEL_TOKEN" ]]` check).
### 10. Update execution order in `main()`
New execution order:
```
1. Install base packages
2. Configure network (static IP, if requested)
3. Install Docker Engine + Compose
4. Install Traefik reverse proxy
5. Generate self-signed certificate (if requested)
6. Run configuration wizard → generates controller.yaml
7. Install Cloudflare Tunnel (if token provided in wizard)
8. Install FileBrowser (protected stack)
9. Deploy felhom-controller
10. Install helper tools
11. Print summary
```
Update step numbering and `get_total_steps()` accordingly.
### 11. Update `print_summary()`
Update the summary to reflect:
- Controller is deployed and accessible at `https://felhom.<DOMAIN>`
- FileBrowser at `https://files.<DOMAIN>`
- Remove manual "deploy felhom-controller" instructions (it's automated now)
- Show healthcheck UUID status (configured / not configured)
- Show hub status (enabled)
- Remove the `CUSTOMER_ID` display bug (the "Note: No --customer specified"
message is inside the `if [[ -n "$CUSTOMER_ID" ]]` block — wrong logic)
### 12. Update `print_help()`
Update help text to reflect:
- Removed `--cf-tunnel-token` (now in wizard)
- Removed `--hdd-path` (deprecated)
- Mention the interactive wizard
- Updated "WHAT THIS SCRIPT INSTALLS" list:
1. Base packages
2. Docker Engine + Compose
3. Traefik reverse proxy
4. TLS certificates
5. Felhom Controller (with interactive configuration)
6. FileBrowser Quantum (web file manager)
7. Cloudflare Tunnel (if configured)
8. Helper tools
---
## Additional observations
### Bugs in current script
1. **`print_summary()` CUSTOMER_ID logic is inverted** (line ~1507):
The "Note: No --customer specified" message is inside `if [[ -n "$CUSTOMER_ID" ]]`
which only triggers when a customer IS specified. Should be in an else branch
or removed.
2. **Step numbering is fragile**: The `get_total_steps()` and hardcoded step
numbers (e.g., `log_step "3/$(get_total_steps)"`) will desync if steps are
added/removed. Consider using a counter variable incremented at each step.
### Things NOT to change
- `bootstrap_sudo()` — works fine, keep as-is
- Network configuration (steps 2) — keep all network manager detection logic
- Docker installation (step 3) — keep as-is
- Traefik installation (step 4) — keep as-is
- Self-signed cert generation — keep as-is
- Helper tools installation — keep as-is
- Error trap and diagnostics — keep as-is
- Color/logging functions — keep as-is
### Template completeness check
The controller.yaml template covers all sections from the current example.
Sections that use sensible defaults and don't need wizard prompts:
- `system.reserved_memory_mb` (384)
- `backup.*` (all defaults are fine)
- `stacks.protected` (hardcoded list)
- `stacks.update_window` ("03:00-05:00")
- `monitoring.thresholds.*` (all defaults)
- `self_update.*` (all defaults)
- `notifications.*` (all defaults)
- `logging.*` (all defaults)
- `assets.*` (hardcoded)
---
## Implementation notes
- The script is bash — no external YAML parser needed. Use `cat > file << EOF`
with variable substitution for generating YAML.
- For bcrypt hashing, prefer `htpasswd -bnBC 10 "" "$password" | tr -d ':\n'`
(apache2-utils is installed in step 1). Fallback: `python3 -c "import bcrypt; ..."`
- The wizard should show current/default values in brackets and accept Enter
for defaults: `read -p "Domain [$default]: " input; value="${input:-$default}"`
- Dry-run mode should show what the wizard WOULD generate without writing files.
- All generated files should have appropriate permissions:
- `controller.yaml`: `chmod 600` (contains secrets)
- `docker-compose.yml` files: `chmod 644`
---
## Build & test
After implementing, test the script with `--dry-run` to verify:
```bash
sudo ./docker-setup.sh --domain test.local --customer test --dry-run
```
For a real deployment test on the demo node:
```bash
# Copy script to demo node
SSH=/c/Windows/System32/OpenSSH/ssh.exe
scp scripts/docker-setup.sh kisfenyo@192.168.0.162:/tmp/
# Run on demo node (it already has infrastructure, so most steps will skip)
$SSH kisfenyo@192.168.0.162 "sudo bash /tmp/docker-setup.sh --domain demo-felhom.eu --customer demo-felhom --email certs@felhom.eu --cf-token <token>"
```