bbd0889471
- Scraper extracts tags from mindmegette.hu (<a class="tag">) and schema.org keywords - Tag editor UI with removable chips, search/autocomplete for existing tags, custom add - Mealie: auto-create tags via POST /api/organizers/tags, include in recipe PATCH - Tandoor: include keywords in recipe POST (auto-created by name) - New GET /tags endpoint returns existing tags from both services for search Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
175 lines
8.0 KiB
Markdown
175 lines
8.0 KiB
Markdown
# Recipe Importer
|
|
|
|
Docker container for importing recipes from Hungarian websites into [Mealie](https://mealie.io/) and [Tandoor Recipes](https://tandoor.dev/).
|
|
|
|
**Problem**: Mealie's and Tandoor's built-in URL import cannot parse ingredients and instructions from Hungarian recipe sites like mindmegette.hu.
|
|
|
|
**Solution**: This container provides a web UI that scrapes Hungarian recipe pages with site-specific parsers, lets you review and edit the extracted data, then pushes it to Mealie and/or Tandoor via their REST APIs.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────┐
|
|
│ recipe-importer container (:8000) │
|
|
│ │
|
|
│ Flask + Gunicorn │
|
|
│ ├── /settings → Configure Mealie & Tandoor │
|
|
│ ├── /import → Paste URL, scrape, review │
|
|
│ ├── /scrape → AJAX: parse recipe HTML │
|
|
│ ├── /send → AJAX: push to Mealie API │
|
|
│ ├── /send-tandoor → AJAX: push to Tandoor API │
|
|
│ ├── /tags → AJAX: list tags from both │
|
|
│ └── /health → Health check │
|
|
│ │
|
|
│ Modules: │
|
|
│ ├── app/config.py → JSON config persistence │
|
|
│ ├── app/scraper.py → Site-specific parsers │
|
|
│ ├── app/mealie.py → Mealie REST API client │
|
|
│ └── app/tandoor.py → Tandoor REST API client │
|
|
└───────────────────┬──────────────┬───────────────────┘
|
|
│ HTTP │ HTTP
|
|
▼ ▼
|
|
┌──────────────┐ ┌───────────────┐
|
|
│ Mealie │ │ Tandoor │
|
|
│ POST /api/.. │ │ POST /api/.. │
|
|
│ PUT /api/.. │ │ PUT /api/.. │
|
|
└──────────────┘ └───────────────┘
|
|
```
|
|
|
|
## Supported Sites
|
|
|
|
| Site | Ingredients | Instructions | Image | Tags |
|
|
|------|:-----------:|:------------:|:-----:|:----:|
|
|
| mindmegette.hu | Yes | Yes | Yes | Yes |
|
|
| *Other sites* | Fallback (schema.org JSON-LD) | Fallback (schema.org JSON-LD) | Yes (og:image) | Fallback (schema.org keywords) |
|
|
|
|
### Mindmegette.hu Parser
|
|
|
|
Extracts data from the Angular-rendered HTML:
|
|
|
|
- **Title**: `og:title` meta tag, with ` | Mindmegette.hu` suffix stripped
|
|
- **Description**: `og:description` meta tag
|
|
- **Image**: `og:image` meta tag
|
|
- **Ingredients**: `div.ingredients` → `div.ingredients-meta` rows, each containing `<strong>` (qty), `<span>` (unit), `<a class="ingredients-link">` (food), `<small>` (extra)
|
|
- **Ingredient groups**: Multiple `div.ingredients` containers; group title via `<strong class="ingredients-group">`
|
|
- **Instructions**: `mindmegette-wysiwyg-box` → `ol > li` elements
|
|
- **Tags**: `<a class="tag">` elements inside `div.desktop-wrapper`
|
|
|
|
### Generic Fallback Parser
|
|
|
|
For unsupported sites, attempts extraction via:
|
|
1. Schema.org JSON-LD `@type: Recipe` blocks (`recipeIngredient`, `recipeInstructions`, `keywords`)
|
|
2. OpenGraph meta tags for title, description, image
|
|
|
|
### Adding a New Site Parser
|
|
|
|
1. Create a parser function in `app/scraper.py` with the `@_register("hostname")` decorator
|
|
2. The function receives `(soup: BeautifulSoup, url: str)` and returns the standard recipe dict
|
|
3. The hostname substring is matched against the URL — first match wins, unmatched URLs use the generic fallback
|
|
|
|
## Mealie API Integration
|
|
|
|
The importer uses the Mealie REST API:
|
|
|
|
1. **POST** `/api/recipes` — create a stub recipe (returns slug)
|
|
2. **PATCH** `/api/recipes/{slug}` — populate structured ingredients (with unit/food IDs), instructions, description, orgURL
|
|
3. **PUT** `/api/recipes/{slug}/image` — upload the recipe image
|
|
|
|
**Structured ingredients**: The client resolves unit and food names to Mealie database IDs. Missing units/foods are created automatically via the API. Ingredient groups are supported via the `title` field on the first ingredient of each group.
|
|
|
|
Authentication uses a long-lived API token (Bearer header), created in Mealie at *Profile → API Tokens*.
|
|
|
|
## Tandoor API Integration
|
|
|
|
The importer uses the Tandoor REST API:
|
|
|
|
1. **POST** `/api/recipe/` — create the full recipe in one call (name, description, source_url, steps with nested ingredients)
|
|
2. **PUT** `/api/recipe/{id}/image/` — upload the recipe image
|
|
|
|
**Step-based ingredients**: Tandoor nests ingredients inside steps. All ingredients are attached to the first step. Units and foods are auto-created by name (no separate resolution needed). Ingredient groups use `is_header: true` on a header entry.
|
|
|
|
**Duplicate detection**: Before import, searches Tandoor by title and checks the `source_url` field to detect already-imported recipes.
|
|
|
|
Authentication uses an API token (Bearer header), created in Tandoor at *Settings → API Browser → Auth Token*.
|
|
|
|
## Tag Management
|
|
|
|
Tags are scraped from recipe pages and shown as editable chips in the UI. Users can:
|
|
- **Remove** scraped tags that are irrelevant
|
|
- **Search** existing tags from Mealie and Tandoor (fetched via `GET /tags` endpoint)
|
|
- **Add** custom tags by typing and pressing Enter
|
|
|
|
Tags are sent to both services on import:
|
|
- **Mealie**: Tags are created via `POST /api/organizers/tags` if they don't exist, then attached to the recipe in the PATCH payload
|
|
- **Tandoor**: Keywords are auto-created by including `keywords: [{"name": "..."}]` in the recipe POST
|
|
|
|
## Configuration
|
|
|
|
All settings are persisted to `/data/config.json` (mounted as a Docker volume).
|
|
|
|
| Setting | Description |
|
|
|---------|-------------|
|
|
| `mealie_url` | Full URL to Mealie instance (e.g. `https://mealie.example.com`) |
|
|
| `mealie_api_key` | Mealie API token |
|
|
| `tandoor_url` | Full URL to Tandoor instance (e.g. `https://recipes.example.com`) |
|
|
| `tandoor_api_key` | Tandoor API token |
|
|
|
|
## Deployment
|
|
|
|
### Docker Compose
|
|
|
|
```yaml
|
|
services:
|
|
recipe-importer:
|
|
image: gitea.dooplex.hu/admin/recipe-importer:0.2.0
|
|
container_name: recipe-importer
|
|
restart: unless-stopped
|
|
ports:
|
|
- "8011:8000"
|
|
volumes:
|
|
- recipe-data:/data
|
|
environment:
|
|
- SECRET_KEY=change-me-in-production
|
|
- MEALIE_INTERNAL_URL=http://mealie:9000
|
|
- TANDOOR_INTERNAL_URL=http://tandoor:8080
|
|
|
|
volumes:
|
|
recipe-data:
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `SECRET_KEY` | `recipe-importer-dev-key` | Flask session secret |
|
|
| `DATA_DIR` | `/data` | Persistent storage path |
|
|
| `VERSION` | `dev` | Shown in the UI navbar |
|
|
| `MEALIE_INTERNAL_URL` | *(empty)* | Docker-internal Mealie URL (e.g. `http://mealie:9000`) to avoid Cloudflare hairpin |
|
|
| `TANDOOR_INTERNAL_URL` | *(empty)* | Docker-internal Tandoor URL (e.g. `http://tandoor:8080`) to avoid Cloudflare hairpin |
|
|
|
|
## Building
|
|
|
|
On the build server (kisfenyo@192.168.0.180):
|
|
|
|
```bash
|
|
cd ~/build/recipe-importer
|
|
./build.sh X.X.X --push
|
|
```
|
|
|
|
## Web UI
|
|
|
|
The UI is in Hungarian and uses a dark theme. The workflow is:
|
|
|
|
1. **Settings** (`/settings`) — Configure Mealie and/or Tandoor connection (URL + API key), test each connection
|
|
2. **Import** (`/import`) — Paste a recipe URL, click "Beolvasás" (Scrape)
|
|
3. **Review** — Edit structured ingredients (4-column: quantity, unit, food, note), add/remove ingredient groups, edit instructions, manage tags (add/remove/search existing)
|
|
4. **Send** — Click "Importálás Mealie-be" and/or "Importálás Tandoor-ba" to push to your configured services
|
|
|
|
## Tech Stack
|
|
|
|
- **Runtime**: Python 3.12 (slim)
|
|
- **Web framework**: Flask 3.1 + Gunicorn
|
|
- **HTML parsing**: BeautifulSoup 4 + lxml
|
|
- **HTTP client**: requests
|
|
- **Container**: ~60 MB image
|