a0bcb62588
- New sobors.hu parser with ingredient groups and section headers - Incomplete recipe warnings (missing ingredients/instructions) - Optional HTTP Basic Auth (configurable on settings page) - Brand text: "Recept" in white, "Importáló" in blue - Larger logo (36px), favicon using logo_notext.svg Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
227 lines
11 KiB
Markdown
227 lines
11 KiB
Markdown
# Recipe Importer
|
|
|
|
Docker container for importing recipes from Hungarian websites into [Mealie](https://mealie.io/) and [Tandoor Recipes](https://tandoor.dev/).
|
|
|
|
**Problem**: Mealie's and Tandoor's built-in URL import cannot parse ingredients and instructions from Hungarian recipe sites like mindmegette.hu.
|
|
|
|
**Solution**: This container provides a web UI that scrapes Hungarian recipe pages with site-specific parsers, lets you review and edit the extracted data, then pushes it to Mealie and/or Tandoor via their REST APIs. Supports both single recipe import and bulk import of multiple URLs.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────┐
|
|
│ recipe-importer container (:8000) │
|
|
│ │
|
|
│ Flask + Gunicorn │
|
|
│ ├── /settings → Configure Mealie & Tandoor │
|
|
│ ├── /import → Single or bulk import │
|
|
│ ├── /scrape → AJAX: parse recipe HTML │
|
|
│ ├── /send → AJAX: push to Mealie API │
|
|
│ ├── /send-tandoor → AJAX: push to Tandoor API │
|
|
│ ├── /tags → AJAX: list tags from both │
|
|
│ └── /health → Health check │
|
|
│ │
|
|
│ Modules: │
|
|
│ ├── app/config.py → JSON config persistence │
|
|
│ ├── app/scraper.py → Site-specific parsers │
|
|
│ ├── app/mealie.py → Mealie REST API client │
|
|
│ └── app/tandoor.py → Tandoor REST API client │
|
|
└───────────────────┬──────────────┬───────────────────┘
|
|
│ HTTP │ HTTP
|
|
▼ ▼
|
|
┌──────────────┐ ┌───────────────┐
|
|
│ Mealie │ │ Tandoor │
|
|
│ POST /api/.. │ │ POST /api/.. │
|
|
│ PUT /api/.. │ │ PUT /api/.. │
|
|
└──────────────┘ └───────────────┘
|
|
```
|
|
|
|
## Supported Sites
|
|
|
|
| Site | Ingredients | Instructions | Image | Tags |
|
|
|------|:-----------:|:------------:|:-----:|:----:|
|
|
| mindmegette.hu | Yes | Yes | Yes | Yes |
|
|
| streetkitchen.hu | Yes (with groups) | Yes (ol/ul/paragraph) | Yes | Yes (from JSON-LD categories) |
|
|
| nosalty.hu | Yes (with groups) | Yes (with section headers) | Yes | Yes |
|
|
| sobors.hu | Yes (with groups) | Yes (with section headers) | Yes | Yes |
|
|
| *Other sites* | Fallback (schema.org JSON-LD) | Fallback (schema.org JSON-LD) | Yes (og:image) | Fallback (schema.org keywords) |
|
|
|
|
### Mindmegette.hu Parser
|
|
|
|
Extracts data from the Angular-rendered HTML:
|
|
|
|
- **Title**: `og:title` meta tag, with ` | Mindmegette.hu` suffix stripped
|
|
- **Description**: `og:description` meta tag
|
|
- **Image**: `og:image` meta tag
|
|
- **Ingredients**: `div.ingredients` → `div.ingredients-meta` rows, each containing `<strong>` (qty), `<span>` (unit), `<a class="ingredients-link">` (food), `<small>` (extra)
|
|
- **Ingredient groups**: Multiple `div.ingredients` containers; group title via `<strong class="ingredients-group">`
|
|
- **Instructions**: `mindmegette-wysiwyg-box` → `ol > li` elements
|
|
- **Tags**: `<a class="tag">` elements inside `div.desktop-wrapper`
|
|
|
|
### Streetkitchen.hu Parser
|
|
|
|
Extracts data from the Next.js-rendered HTML:
|
|
|
|
- **Title**: `og:title` meta tag, with ` | Street Kitchen` suffix stripped
|
|
- **Description**: `og:description` meta tag
|
|
- **Image**: `og:image` meta tag (CDN URL)
|
|
- **Ingredients**: `div.grid.grid-cols-1` container → `div.my-2.flex` rows; quantity+unit merged in first `<div>` (split via regex), food in `<div class="font-bold">`, optional extra in parenthesised `<div>`
|
|
- **Ingredient groups**: `<h5>` headers inside section divs (e.g. "Az előfőzéshez", "A sütéshez")
|
|
- **Instructions**: Three formats handled — `<ol>` ordered list, `<ul>` unordered list, or plain `<p>` paragraphs (with optional `<strong>` section headers)
|
|
- **Tags**: `recipeCategory` field from JSON-LD `@graph` → `Recipe` object (comma-separated)
|
|
|
|
### Nosalty.hu Parser
|
|
|
|
Extracts data from the nosalty.hu recipe pages:
|
|
|
|
- **Title**: `og:title` meta tag
|
|
- **Description**: Story text from `div#recipe-story > p` (nosalty has no dedicated description field)
|
|
- **Image**: `og:image` meta tag
|
|
- **Ingredients**: Scoped to `div#ingredients` to avoid per-serving/nutrition duplicates; `ul.m-list__list > li.m-list__item` rows with `<span>` (qty+unit), `<a class="a-link">` (food), optional trailing `<span>` (extra notes in parentheses)
|
|
- **Ingredient groups**: `<h3 class="m-list__title">` headers between `<ul>` lists
|
|
- **Instructions**: `div#select` → `ol.m-list__list > li.m-list__item` steps; optional `<h4 class="m-list__title">` section headers
|
|
- **Tags**: `<a class="m-tags__tagItem">` inside `div.p-recipe__attributeList`
|
|
|
|
### Sobors.hu Parser
|
|
|
|
Extracts data from the sobors.hu recipe pages:
|
|
|
|
- **Title**: `h3.recept_nev`
|
|
- **Description**: `og:description` meta tag
|
|
- **Image**: `og:image` meta tag
|
|
- **Ingredients**: `div.hozzavalok-container` → `section` elements with `ul > li`, each containing `span.mennyiseg` (qty), `span.mertekegyseg` (unit), `span.hozzavalo` (food)
|
|
- **Ingredient groups**: `section > h4` headers (e.g., "A szószhoz:", "A húsgolyókhoz:")
|
|
- **Instructions**: `div.recept_leiras` → `<p>` tags, with `<h3><strong>` section headers
|
|
- **Tags**: `div.cikk-cimkek > ul.cikk-cimkek-list > li > a` (skips generic "Receptek" category)
|
|
|
|
### Generic Fallback Parser
|
|
|
|
For unsupported sites, attempts extraction via:
|
|
1. Schema.org JSON-LD `@type: Recipe` blocks (`recipeIngredient`, `recipeInstructions`, `keywords`)
|
|
2. OpenGraph meta tags for title, description, image
|
|
|
|
### Adding a New Site Parser
|
|
|
|
1. Create a parser function in `app/scraper.py` with the `@_register("hostname")` decorator
|
|
2. The function receives `(soup: BeautifulSoup, url: str)` and returns the standard recipe dict
|
|
3. The hostname substring is matched against the URL — first match wins, unmatched URLs use the generic fallback
|
|
|
|
## Bulk Import
|
|
|
|
The "Tömeges importálás" (Bulk Import) tab allows importing multiple recipes at once:
|
|
|
|
1. Paste one URL per line in the textarea
|
|
2. Choose a mode:
|
|
- **Review mode** — edit each recipe before importing, with option to switch to auto mid-way
|
|
- **Auto mode** — scrape and import all recipes without manual review (with tag option: import all tags or none)
|
|
3. Select target: Mealie, Tandoor, or both
|
|
4. Progress table tracks per-recipe status (pending, scraping, importing, done, error, skipped, duplicate)
|
|
|
|
All processing is done client-side, calling the existing `/scrape` and `/send` / `/send-tandoor` endpoints sequentially.
|
|
|
|
## Mealie API Integration
|
|
|
|
The importer uses the Mealie REST API:
|
|
|
|
1. **POST** `/api/recipes` — create a stub recipe (returns slug)
|
|
2. **PATCH** `/api/recipes/{slug}` — populate structured ingredients (with unit/food IDs), instructions, description, orgURL
|
|
3. **PUT** `/api/recipes/{slug}/image` — upload the recipe image
|
|
|
|
**Structured ingredients**: The client resolves unit and food names to Mealie database IDs. Missing units/foods are created automatically via the API. Ingredient groups are supported via the `title` field on the first ingredient of each group.
|
|
|
|
Authentication uses a long-lived API token (Bearer header), created in Mealie at *Profile → API Tokens*.
|
|
|
|
## Tandoor API Integration
|
|
|
|
The importer uses the Tandoor REST API:
|
|
|
|
1. **POST** `/api/recipe/` — create the full recipe in one call (name, description, source_url, steps with nested ingredients)
|
|
2. **PUT** `/api/recipe/{id}/image/` — upload the recipe image
|
|
|
|
**Step-based ingredients**: Tandoor nests ingredients inside steps. All ingredients are attached to the first step. Units and foods are auto-created by name (no separate resolution needed). Ingredient groups use `is_header: true` on a header entry.
|
|
|
|
**Duplicate detection**: Before import, searches Tandoor by title and checks the `source_url` field to detect already-imported recipes.
|
|
|
|
Authentication uses an API token (Bearer header), created in Tandoor at *Settings → API Browser → Auth Token*.
|
|
|
|
## Tag Management
|
|
|
|
Tags are scraped from recipe pages and shown as editable chips in the UI. Users can:
|
|
- **Remove** scraped tags that are irrelevant
|
|
- **Search** existing tags from Mealie and Tandoor (fetched via `GET /tags` endpoint)
|
|
- **Add** custom tags by typing and pressing Enter
|
|
|
|
Tags are sent to both services on import:
|
|
- **Mealie**: Tags are created via `POST /api/organizers/tags` if they don't exist, then attached to the recipe in the PATCH payload
|
|
- **Tandoor**: Keywords are auto-created by including `keywords: [{"name": "..."}]` in the recipe POST
|
|
|
|
## Configuration
|
|
|
|
All settings are persisted to `/data/config.json` (mounted as a Docker volume).
|
|
|
|
| Setting | Description |
|
|
|---------|-------------|
|
|
| `mealie_url` | Full URL to Mealie instance (e.g. `https://mealie.example.com`) |
|
|
| `mealie_api_key` | Mealie API token |
|
|
| `tandoor_url` | Full URL to Tandoor instance (e.g. `https://recipes.example.com`) |
|
|
| `tandoor_api_key` | Tandoor API token |
|
|
|
|
## Deployment
|
|
|
|
### Docker Compose
|
|
|
|
```yaml
|
|
services:
|
|
recipe-importer:
|
|
image: gitea.dooplex.hu/admin/recipe-importer:0.2.0
|
|
container_name: recipe-importer
|
|
restart: unless-stopped
|
|
ports:
|
|
- "8011:8000"
|
|
volumes:
|
|
- recipe-data:/data
|
|
environment:
|
|
- SECRET_KEY=change-me-in-production
|
|
- MEALIE_INTERNAL_URL=http://mealie:9000
|
|
- TANDOOR_INTERNAL_URL=http://tandoor:8080
|
|
|
|
volumes:
|
|
recipe-data:
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `SECRET_KEY` | `recipe-importer-dev-key` | Flask session secret |
|
|
| `DATA_DIR` | `/data` | Persistent storage path |
|
|
| `VERSION` | `dev` | Shown in the UI navbar |
|
|
| `MEALIE_INTERNAL_URL` | *(empty)* | Docker-internal Mealie URL (e.g. `http://mealie:9000`) to avoid Cloudflare hairpin |
|
|
| `TANDOOR_INTERNAL_URL` | *(empty)* | Docker-internal Tandoor URL (e.g. `http://tandoor:8080`) to avoid Cloudflare hairpin |
|
|
|
|
## Building
|
|
|
|
On the build server (kisfenyo@192.168.0.180):
|
|
|
|
```bash
|
|
cd ~/build/recipe-importer
|
|
./build.sh X.X.X --push
|
|
```
|
|
|
|
## Web UI
|
|
|
|
The UI is in Hungarian and uses a dark theme. The workflow is:
|
|
|
|
1. **Settings** (`/settings`) — Configure Mealie and/or Tandoor connection (URL + API key), test each connection
|
|
2. **Import** (`/import`) — Paste a recipe URL, click "Beolvasás" (Scrape)
|
|
3. **Review** — Edit structured ingredients (4-column: quantity, unit, food, note), add/remove ingredient groups, edit instructions, manage tags (add/remove/search existing)
|
|
4. **Send** — Click "Importálás Mealie-be" and/or "Importálás Tandoor-ba" to push to your configured services
|
|
|
|
## Tech Stack
|
|
|
|
- **Runtime**: Python 3.12 (slim)
|
|
- **Web framework**: Flask 3.1 + Gunicorn
|
|
- **HTML parsing**: BeautifulSoup 4 + lxml
|
|
- **HTTP client**: requests
|
|
- **Container**: ~60 MB image
|