Commit Graph

132 Commits

Author SHA1 Message Date
admin 4bd0909f2b hub: restore-test "passed with warnings" visibility (v0.7.5)
Phase B (hub half) of the restore-test warning fix. The agent v0.7.0 now passes a
restore-test that emitted a benign start advisory (systemd-nesting) and carries the
warning text on the wire.

- hostRestoreTest gains warnings + warnings_recognized mirror fields (omitempty;
  absent recognized => false => louder unrecognized path)
- ingest logs [INFO] passed WITH WARNINGS (recognized), [WARN] for unrecognized;
  FAILED still [WARN]
- golden restore_tests[0] gains the keys, byte-identical with felhom-agent (sha256
  e6999d77...); bidirectional key-set contract test round-trips them
- no dashboard widget: no host-domain dashboard surface exists yet (log+persist only,
  as with pbs_snapshots) -- deferred to slice 10

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 19:41:21 +02:00
admin 5268411014 deploy(hub): bump manifests/hub.yaml image to v0.7.4 (ingest pbs_snapshots)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 17:20:16 +02:00
admin 5bc4c3d967 hub v0.7.4: ingest agent pbs_snapshots (slice 6 Phase B)
Accept + persist the now-populated host-report pbs_snapshots. hostPBSSnapshot mirror in
hostReportPayload (persisted via report_json, no schema change); a FAILED PBS verify is
logged prominently (loudest offsite-DR signal). Shared golden updated byte-identical with
felhom-agent; TestHostPBSSnapshot_GoldenContract added. Build/deploy deferred (backward-compatible).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 17:15:58 +02:00
admin 8db15bac16 deploy(hub): bump manifests/hub.yaml image to v0.7.3 (ingest backups + restore_tests)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 16:35:21 +02:00
admin 94a236328b docs(spike): phase 5 PBS mechanism findings (DooPlex server ← N100 client)
Empirical PBS validation before the slice-6 Phase B spec. Records: PBS install on
Debian-13 DooPlex (trixie key ships in proxmox-archive-keyring, no standalone .gpg),
datastore + cert fingerprint, the PBS privsep gotcha (grant role on user AND token),
the encrypted pbs storage + key location (/etc/pve/priv/storage/<id>.enc), the snapshot
volid format + native fields (→ PBSSnapshot shape), restore-from-PBS works unchanged,
the verify mechanism (server-side; agent drives it remotely via the PBS API, result read
from snapshot verification.state), no operator-token privilege gap, and zero-knowledge
confirmed (server can't decrypt without the client key). PBS+datastore+storage left up
for Phase B; no secrets committed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 16:26:57 +02:00
admin 41f2d2b5da hub v0.7.3: ingest agent backups + restore_tests (slice 6 Phase A)
Accept + persist the now-populated host-report backups/restore_tests. Mirror structs in
hostReportPayload; persisted via report_json (no schema change); a FAILED restore-test is
logged prominently (loudest DR signal). Shared golden updated byte-identical with
felhom-agent; bidirectional key-set tests added. Build/deploy deferred (backward-compatible).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 13:56:18 +02:00
admin 0c6ec27054 docs(CLAUDE): document the working kubectl sync trigger for felhom (argocd CLI not logged in)
The argocd CLI on 180 has no server session and --core breaks under sudo (env stripped);
the reliable scripted sync is annotate refresh + patch .operation on the Application CR.
Verified by deploying hub v0.7.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 10:26:33 +02:00
admin 7b5f860a70 deploy(hub): bump manifests/hub.yaml image to v0.7.2 (host-domain ingest + storage_targets)
Live hub was v0.6.3 (pre host-report endpoint); v0.7.0-v0.7.2 were changelogged but
the manifest was never bumped. This deploys the host-domain ingest (slice 3) +
storage_targets (slice 5 Phase A). Additive/idempotent migrate(); controller path untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 10:23:08 +02:00
admin 9347fcd3a5 docs(CLAUDE): correct hub/manifests deploy to GitOps via the 'felhom' ArgoCD app
No separate hub app; manifests/ synced by app 'felhom' (auto-sync off). Deploy =
build+push pinned image -> bump manifests/hub.yaml tag + commit -> manual sync.
Never :latest (manifest is ArgoCD's truth). Replaces the stale kubectl apply/set image steps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 10:19:23 +02:00
admin 6e05e0ff7c docs: REPORT — clarify hub v0.7.2 deploy deferred (live hub at v0.6.3)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 10:01:04 +02:00
admin aaff268fff hub v0.7.2: ingest agent storage_targets (slice 5 Phase A)
Accept + persist the now-populated host-report storage_targets. Minimal — the
authoritative storage manifest is hub-owned (slice 10); this mirrors what the agent
observes.

- hostReportPayload.StorageTargets: full mirror of the agent's hub.StorageTarget
  wire contract; persisted verbatim in report_json (no schema change); count +
  WARN on disconnected targets.
- shared host-report golden updated with two populated targets; byte-identical with
  felhom-agent's copy.
- TestHostStorageTarget_GoldenContract: hub half of the bidirectional key-set test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 09:59:27 +02:00
admin 2f8658981d docs: reflow CLAUDE.md; switch REPORT.md to overwrite-latest; add no-secrets rule
Unify the REPORT/CHANGELOG convention with the sibling repos (REPORT.md was
append/cumulative -> now overwrite-latest; CHANGELOG stays cumulative). Reflow
removes hard mid-paragraph line wraps; rendered output unchanged. CHANGELOG entry
in hub/CHANGELOG.md. No hub code change -> no version bump.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 20:54:32 +02:00
admin 7bc27c38de update 2026-06-08 20:06:11 +02:00
admin aab3e137c5 updated CLAUDE.md 2026-06-08 19:17:41 +02:00
admin 4be3bdf486 fix(hub): slice-3 follow-ups — /host-report 413 oversize + contract golden (v0.7.1)
- handleHostReport: read maxHostReportBytes+1 (4 MiB const) and reject oversize with
  413 instead of silent LimitReader truncation. Controller handleReport (1 MiB) is
  unchanged. Test asserts 413.
- contract: hub/internal/api/testdata/host-report.golden.json (byte-identical with
  felhom-agent's copy) + TestHostReport_GoldenContract drives the real handler and
  asserts 200 + denorm + both guests upserted.
- CHANGELOG v0.7.1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 18:31:44 +02:00
admin 23611c20ef chore(hub): revert incidental gofmt-only reformatting outside slice-3 scope
Restores notify/templates.go, store/telemetry.go, web/configs.go to upstream —
those were alignment-only churn from a tree-wide gofmt, not part of slice 3. Keeps
the host-domain diff additions-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:38:18 +02:00
admin 7c0c75457f feat(hub): host-domain ingest — tables + /host-report + per-host auth + host dead-man's-switch (v0.7.0, slice 3)
Purely additive; the controller path (reports/customer_configs/checkAuthCustomer/
existing checkers) is untouched. Cutover remains slice 10.

- store: new hosts/guests/host_reports tables (full schema incl. columns INERT
  until slice 10, so no later ALTER); GetHostByAPIKey/GetHost/ListHosts/UpsertHost/
  SaveHostReport/UpsertGuestFromReport (preserves inert cols)/GetHostStaleness/
  GuestID; Prune also prunes host_reports.
- api: checkAuthHost (sibling of checkAuthCustomer); POST /host-report (per-host
  Bearer, 4MiB, denorm + guest upsert, control envelope); POST /admin/hosts
  (PROVISIONAL global-key host mint); host_* event types registered.
- monitor: HostStalenessChecker sibling over host_reports (host_stale/down/
  recovered), wired on the existing 60s ticker; controller checkers unchanged.
- tests (hermetic): store intent/inert-column preservation, auth, ingest
  (envelope+denorm, mismatch/unknown/blocked/oversize), admin mint round-trip,
  host staleness transitions.

CHANGELOG v0.7.0. Contract matches the agent host-report spec field-for-field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 16:36:16 +02:00
admin 0d832def7b fix: update repo-name refs after deploy-felhom-compose -> felhom-controller rename
- hub/internal/web/templatefetcher.go: raw-template URL now points at the renamed
  repo (was relying on Gitea's post-rename redirect)
- documentation/ (moved here from the felhom-agent repo): fix controller-source path
  refs (deploy-felhom-compose -> felhom-controller) and the platform repo name
  (proxmox-controller -> felhom-agent)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 14:03:13 +02:00
admin cb1d964620 Merge pull request 'moved documentation to felhom.eu' (#7) from fix/filebrowser-config-args into main
Reviewed-on: #7
2026-06-08 11:54:53 +00:00
admin 3d6cde8080 Merge pull request 'docs: rework repo-name references for renames' (#6) from chore/rename-repo-refs into main
Reviewed-on: #6
2026-06-08 11:52:04 +00:00
admin 715f644bf0 moved documentation to felhom.eu 2026-06-08 13:50:14 +02:00
admin 0f12e17175 docs: rework repo-name references for renames
deploy-felhom-compose -> felhom-controller, proxmox-controller -> felhom-agent in
README.md and CLAUDE.md. Hub source (templatefetcher.go) intentionally left untouched
per scope; its raw-template URL is flagged separately for the operator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 13:39:53 +02:00
admin 7b545c1ec7 Merge pull request 'fix: pass --config to filebrowser (v2.63.x changed default lookup path)' (#5) from fix/filebrowser-config-args into main 2026-06-06 12:22:05 +00:00
admin ea66afa960 manifests: pass --config to filebrowser so it reads our ConfigMap
The previous PR pinned filebrowser to v2.63.13 + runAsUser:0 which
solved the PVC permission issue, but the pod was still 0/1 Ready
because v2.63.x changed the default config-file lookup path:

  Old (v2-alpine): /.filebrowser.json (matched our existing mount)
  New (v2.63.13) : /config/settings.json (NOT mounted in this pod)

So the new image ran with its built-in defaults (port 80, in-memory
db), and the readiness probe on 8080/health timed out.

Fix: pass `args: ["-c", "/.filebrowser.json"]` so filebrowser uses the
ConfigMap we already mount there. No volumeMount changes needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 14:22:04 +02:00
admin 87b062e84a Merge pull request 'feat: umami 3.1.0 + filebrowser v2.63.13 (root)' (#4) from feat/umami-v3-filebrowser-root into main 2026-06-06 12:17:21 +00:00
admin bd0531e4a8 manifests: umami -> 3.1.0 (v3 line) + filebrowser v2.63.13 with runAsUser:0
umami:
  Switch from SHA-pinned v3.0.3 to the tagged v3.1.0 release (the v3
  line proper -- same schema lineage, normal Prisma minor-version
  migration). This is the documented forward path that the version-
  checker hint `postgresql-latest -> 3.1` indicated. The v1.x
  postgresql-vX.Y.Z line we briefly tried earlier today is a
  DIFFERENT image lineage with incompatible migrations -- avoid.

filebrowser:
  Re-pin to v2.63.13 (debian-based default) so Renovate can track
  future bumps. The non-root UID in that image can't write to the
  existing PVC contents (chowned to root by the previous v2-alpine
  image), so set pod-level securityContext runAsUser:0 + runAsGroup:0
  to keep using the same volume layout without a chown initContainer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 14:17:20 +02:00
admin dc64bb2d79 Merge pull request 'fix(URGENT): pin umami to exact SHA (v1.38.0 has schema lineage mismatch)' (#3) from fix/umami-sha-pin into main 2026-06-06 11:53:55 +00:00
admin 7e6ea9d66c manifests: pin umami to exact image SHA (schema mismatch with v1.38.0)
Previous PR pinned `ghcr.io/umami-software/umami:postgresql-v1.38.0`.
The new pod crashlooped on Prisma:

  ERROR: relation "event" does not exist
  Migration name: 02_add_event_data
  Database error code: 42P01

The 120-day-old working pod's actual image is:
  ghcr.io/umami-software/umami@sha256:28f263fe06f79ebffa5a6a6e9b...

It runs an older umami build whose schema doesn't have the `event`
table that the v1 migration `02_add_event_data` operates on. The DB
has migrations 10-14 applied (newer than 02 by name) but 02 isn't in
its applied set -- likely a schema fork between the line our 120d pod
runs and the postgresql-vX.Y.Z line that v1.38.0 advances toward.

Pin to the exact SHA that the working pod uses, so pod restarts +
ArgoCD syncs both keep producing pods on the same known-good image
(cached on the node, no registry pull needed). Renovate also stops
chasing the broken upgrade path.

Proper fix (deferred): plan a v3.x migration. The version-checker
dashboard hint `postgresql-latest → 3.1` suggests umami v3.x dropped
the `postgresql-` prefix and is what we'd want long-term. That needs
a real DB migration plan since the schema lineage is genuinely
different from this image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 13:53:54 +02:00
admin a964dc20a4 Merge pull request 'fix: revert filebrowser to v2-alpine (PVC permission issue with v2.63.13)' (#2) from fix/filebrowser-revert into main 2026-06-06 11:45:19 +00:00
admin df2a1259d9 manifests: revert filebrowser v2.63.13 -> v2-alpine (PVC permission issue)
The previous PR pinned `filebrowser/filebrowser:v2-alpine` to v2.63.13
but it crashlooped on:

  Error: open /database/filebrowser.db: permission denied

The v2.63.13 image (debian-based default) runs as a non-root UID and
can't write to files on the PVC that were created by the v2-alpine
image (which ran as root). No `v2.63.13-alpine` tag exists upstream
(filebrowser stopped publishing per-version alpine variants), so we
can't trivially preserve the same runtime.

Quick recovery: revert to v2-alpine so filebrowser is usable again.
Proper fix (deferred): either an initContainer that `chown -R 1000:1000
/database /srv` or a `securityContext.fsGroup: 1000` on the pod spec
to let the non-root UID write to the existing PVC. Both require some
care since the chown is destructive if the UID is wrong.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 13:45:18 +02:00
admin e363c6594d Merge pull request 'manifests: re-pin moving tags (umami / filebrowser)' (#1) from fix/version-pins into main 2026-06-06 11:41:51 +00:00
admin ce80dce497 manifests: re-pin moving tags so Renovate can track them
- umami       postgresql-latest  -> postgresql-v1.38.0
  - filebrowser v2-alpine          -> v2.63.13

These two were "latest"-style moving tags that Renovate physically
cannot propose updates for. Pinning to current upstream versions so
future bumps go through the normal Renovate PR flow.

Note: Renovate operates from the homelab-manifests repo, not this one
yet — but felhom-system/* copies exist in homelab-manifests for
discoverability, and Renovate already tracks the pinned forms via a
new customManager for the umami `postgresql-vX.Y.Z` pattern (added in
homelab-manifests admin-system/renovate.yaml). For now, future bumps
will need to be applied to both repos until we consolidate the source
of truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 13:41:50 +02:00
admin 8aa4104586 6.3 2026-06-06 10:29:41 +02:00
admin 276ccda938 updated logo 2026-02-27 11:24:46 +01:00
admin d65dba63bf docs: update hub README for v0.6.3
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 09:23:18 +01:00
admin 5ebf0d5fe4 feat: add auto-refresh toggle on customer detail page
Replace the hardcoded 60s meta-refresh with a JavaScript-based timer
and a toggle switch in the page header. The preference persists across
page loads via localStorage (enabled by default).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 09:19:29 +01:00
admin ac43d0cbf5 deploy: hub v0.6.2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 14:50:15 +01:00
admin f1212e6ba8 feat: infra backup GFS retention + version history
New infra_backup_versions table with GFS pruning (~14 versions per
customer). Recovery endpoint supports ?version=ID. New /versions API.
Dashboard shows collapsible backup history with app names and disk count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 14:47:48 +01:00
admin f82fa9be2c favicon to svg 2026-02-26 13:21:55 +01:00
admin 1eccd4df58 added favicon png 2026-02-26 13:20:26 +01:00
admin 652d567864 updated favicon 2026-02-26 13:17:14 +01:00
admin c3d087bc0f fix: double-v in version display, reset error counts on issue deletion
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 16:08:45 +01:00
admin 2a83a4e96c deploy: hub v0.6.1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 16:04:12 +01:00
admin 7860f96a56 Hub v0.6.1: delete issues from UI + fingerprint hardening
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 16:01:55 +01:00
admin 23cb487348 deploy: hub v0.6.0 2026-02-25 12:45:02 +01:00
admin 5e2012728f Hub v0.6.0: Geo-restriction display + disable button + UUID cleanup
- Add geo-restriction section to customer detail page (status, countries,
  per-app overrides, sync state, errors)
- Add "Összes geo-korlátozás eltávolítása" button that directly calls
  Cloudflare API to delete [felhom-geo] WAF rules (bypasses blocked tunnel)
- Background retry to notify controller to disable geo in settings
- New internal/cloudflare/unblock.go — minimal CF client for rule deletion
- Remove legacy Monitoring UUIDs from config form, buildConfigJSON,
  handlePullConfig, volatileKeys, and controller.yaml.default

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 12:43:00 +01:00
admin f50278e2b0 favicons 2026-02-25 12:29:12 +01:00
admin d94ac7b65d deploy: hub v0.5.1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 12:00:48 +01:00
admin 906c143aea docs: update CF token permissions for geo-restriction
Config form now shows Zone WAF:Edit requirement alongside DNS:Edit.
Hub README updated with permission note.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:58:25 +01:00
admin 61ef1a3952 removed healthchecks page 2026-02-25 10:25:07 +01:00