- New parser for kiskegyed.hu: ingredients (with groups, dual measurements), instructions (ol > li > div), tags (section.tags) - Dual measurement handling: "3 ek (70 g)" extracts alternate measurement to comment field - Cross-site linking: kiskegyed→sobors links are followed to get full recipe (mirrors existing sobors→kiskegyed support) - Supported sites now shown as clickable URLs in the import page - supported_sites() returns dicts with name and url Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 KiB
Recipe Importer
Docker container for importing recipes from Hungarian websites into Mealie and Tandoor Recipes.
Problem: Mealie's and Tandoor's built-in URL import cannot parse ingredients and instructions from Hungarian recipe sites like mindmegette.hu.
Solution: This container provides a web UI that scrapes Hungarian recipe pages with site-specific parsers, lets you review and edit the extracted data, then pushes it to Mealie and/or Tandoor via their REST APIs. Supports both single recipe import and bulk import of multiple URLs.
Architecture
┌──────────────────────────────────────────────────────┐
│ recipe-importer container (:8000) │
│ │
│ Flask + Gunicorn │
│ ├── /settings → Configure Mealie & Tandoor │
│ ├── /import → Single or bulk import │
│ ├── /scrape → AJAX: parse recipe HTML │
│ ├── /send → AJAX: push to Mealie API │
│ ├── /send-tandoor → AJAX: push to Tandoor API │
│ ├── /tags → AJAX: list tags from both │
│ └── /health → Health check │
│ │
│ Modules: │
│ ├── app/config.py → JSON config persistence │
│ ├── app/scraper.py → Site-specific parsers │
│ ├── app/mealie.py → Mealie REST API client │
│ └── app/tandoor.py → Tandoor REST API client │
└───────────────────┬──────────────┬───────────────────┘
│ HTTP │ HTTP
▼ ▼
┌──────────────┐ ┌───────────────┐
│ Mealie │ │ Tandoor │
│ POST /api/.. │ │ POST /api/.. │
│ PUT /api/.. │ │ PUT /api/.. │
└──────────────┘ └───────────────┘
Supported Sites
| Site | Ingredients | Instructions | Image | Tags |
|---|---|---|---|---|
| mindmegette.hu | Yes | Yes | Yes | Yes |
| streetkitchen.hu | Yes (with groups) | Yes (ol/ul/paragraph) | Yes | Yes (from JSON-LD categories) |
| nosalty.hu | Yes (with groups) | Yes (with section headers) | Yes | Yes |
| sobors.hu | Yes (with groups) | Yes (with section headers, follows linked recipes) | Yes | Yes |
| kiskegyed.hu | Yes (with groups, dual measurements) | Yes (follows sobors.hu links) | Yes | Yes |
| Other sites | Fallback (schema.org JSON-LD) | Fallback (schema.org JSON-LD) | Yes (og:image) | Fallback (schema.org keywords) |
Mindmegette.hu Parser
Extracts data from the Angular-rendered HTML:
- Title:
og:titlemeta tag, with| Mindmegette.husuffix stripped - Description:
og:descriptionmeta tag - Image:
og:imagemeta tag - Ingredients:
div.ingredients→div.ingredients-metarows, each containing<strong>(qty),<span>(unit),<a class="ingredients-link">(food),<small>(extra) - Ingredient groups: Multiple
div.ingredientscontainers; group title via<strong class="ingredients-group"> - Instructions:
mindmegette-wysiwyg-box→ol > lielements - Tags:
<a class="tag">elements insidediv.desktop-wrapper
Streetkitchen.hu Parser
Extracts data from the Next.js-rendered HTML:
- Title:
og:titlemeta tag, with| Street Kitchensuffix stripped - Description:
og:descriptionmeta tag - Image:
og:imagemeta tag (CDN URL) - Ingredients:
div.grid.grid-cols-1container →div.my-2.flexrows; quantity+unit merged in first<div>(split via regex), food in<div class="font-bold">, optional extra in parenthesised<div> - Ingredient groups:
<h5>headers inside section divs (e.g. "Az előfőzéshez", "A sütéshez") - Instructions: Three formats handled —
<ol>ordered list,<ul>unordered list, or plain<p>paragraphs (with optional<strong>section headers) - Tags:
recipeCategoryfield from JSON-LD@graph→Recipeobject (comma-separated)
Nosalty.hu Parser
Extracts data from the nosalty.hu recipe pages:
- Title:
og:titlemeta tag - Description: Story text from
div#recipe-story > p(nosalty has no dedicated description field) - Image:
og:imagemeta tag - Ingredients: Scoped to
div#ingredientsto avoid per-serving/nutrition duplicates;ul.m-list__list > li.m-list__itemrows with<span>(qty+unit),<a class="a-link">(food), optional trailing<span>(extra notes in parentheses) - Ingredient groups:
<h3 class="m-list__title">headers between<ul>lists - Instructions:
div#select→ol.m-list__list > li.m-list__itemsteps; optional<h4 class="m-list__title">section headers - Tags:
<a class="m-tags__tagItem">insidediv.p-recipe__attributeList
Sobors.hu Parser
Extracts data from the sobors.hu recipe pages:
- Title:
h3.recept_nev - Description:
og:descriptionmeta tag - Image:
og:imagemeta tag - Ingredients:
div.hozzavalok-container→sectionelements withul > li, each containingspan.mennyiseg(qty),span.mertekegyseg(unit),span.hozzavalo(food) - Ingredient groups:
section > h4headers (e.g., "A szószhoz:", "A húsgolyókhoz:") - Instructions:
div.recept_leiras→<p>tags, with<h3><strong>section headers - Linked recipes: Some pages link to another site (e.g. kiskegyed.hu) instead of showing full instructions. The parser detects external links in the instruction area and follows them to scrape the real recipe content.
- Article-style ingredient fallback: Pages without the structured
div.hozzavalok-containerare parsed from article-bodyh4+ul > liplain text - Tags:
div.cikk-cimkek > ul.cikk-cimkek-list > li > a(skips generic "Receptek" category)
Kiskegyed.hu Parser
Extracts data from kiskegyed.hu recipe pages:
- Title:
h2element (with- Kiskegyedsuffix stripped) - Description:
section#leadText > p - Image:
og:imagemeta tag - Ingredients:
div.recipe_ingredients→ul.list > liitems; group headers from<p>or<p><em>elements - Ingredient groups:
<p>Name:</p>or<p><em>A ...hez</em></p>format - Dual measurements: "3 ek (70 g) búzafinomliszt" → qty: 3, unit: ek, food: búzafinomliszt, extra: 70 g
- Instructions:
div.recipe_preparation > ol > li > div - Cross-site links: Pages linking to sobors.hu are followed to get the full recipe
- Tags:
section.tags > a > span(# prefix stripped, "recept" filtered)
Generic Fallback Parser
For unsupported sites, attempts extraction via:
- Schema.org JSON-LD
@type: Recipeblocks (recipeIngredient,recipeInstructions,keywords) - OpenGraph meta tags for title, description, image
Adding a New Site Parser
- Create a parser function in
app/scraper.pywith the@_register("hostname")decorator - The function receives
(soup: BeautifulSoup, url: str)and returns the standard recipe dict - The hostname substring is matched against the URL — first match wins, unmatched URLs use the generic fallback
Bulk Import
The "Tömeges importálás" (Bulk Import) tab allows importing multiple recipes at once:
- Paste one URL per line in the textarea
- Choose a mode:
- Review mode — edit each recipe before importing, with option to switch to auto mid-way
- Auto mode — scrape and import all recipes without manual review (with tag option: import all tags or none)
- Select target: Mealie, Tandoor, or both
- Progress table tracks per-recipe status (pending, scraping, importing, done, error, skipped, duplicate)
All processing is done client-side, calling the existing /scrape and /send / /send-tandoor endpoints sequentially.
Mealie API Integration
The importer uses the Mealie REST API:
- POST
/api/recipes— create a stub recipe (returns slug) - PATCH
/api/recipes/{slug}— populate structured ingredients (with unit/food IDs), instructions, description, orgURL - PUT
/api/recipes/{slug}/image— upload the recipe image
Structured ingredients: The client resolves unit and food names to Mealie database IDs. Missing units/foods are created automatically via the API. Ingredient groups are supported via the title field on the first ingredient of each group.
Authentication uses a long-lived API token (Bearer header), created in Mealie at Profile → API Tokens.
Tandoor API Integration
The importer uses the Tandoor REST API:
- POST
/api/recipe/— create the full recipe in one call (name, description, source_url, steps with nested ingredients) - PUT
/api/recipe/{id}/image/— upload the recipe image
Step-based ingredients: Tandoor nests ingredients inside steps. All ingredients are attached to the first step. Units and foods are auto-created by name (no separate resolution needed). Ingredient groups use is_header: true on a header entry.
Duplicate detection: Before import, searches Tandoor by title and checks the source_url field to detect already-imported recipes.
Authentication uses an API token (Bearer header), created in Tandoor at Settings → API Browser → Auth Token.
Tag Management
Tags are scraped from recipe pages and shown as editable chips in the UI. Users can:
- Remove scraped tags that are irrelevant
- Search existing tags from Mealie and Tandoor (fetched via
GET /tagsendpoint) - Add custom tags by typing and pressing Enter
Tags are sent to both services on import:
- Mealie: Tags are created via
POST /api/organizers/tagsif they don't exist, then attached to the recipe in the PATCH payload - Tandoor: Keywords are auto-created by including
keywords: [{"name": "..."}]in the recipe POST
Configuration
All settings are persisted to /data/config.json (mounted as a Docker volume).
| Setting | Description |
|---|---|
mealie_url |
Full URL to Mealie instance (e.g. https://mealie.example.com) |
mealie_api_key |
Mealie API token |
tandoor_url |
Full URL to Tandoor instance (e.g. https://recipes.example.com) |
tandoor_api_key |
Tandoor API token |
Deployment
Docker Compose
services:
recipe-importer:
image: gitea.dooplex.hu/admin/recipe-importer:0.2.0
container_name: recipe-importer
restart: unless-stopped
ports:
- "8011:8000"
volumes:
- recipe-data:/data
environment:
- SECRET_KEY=change-me-in-production
- MEALIE_INTERNAL_URL=http://mealie:9000
- TANDOOR_INTERNAL_URL=http://tandoor:8080
volumes:
recipe-data:
Environment Variables
| Variable | Default | Description |
|---|---|---|
SECRET_KEY |
recipe-importer-dev-key |
Flask session secret |
DATA_DIR |
/data |
Persistent storage path |
VERSION |
dev |
Shown in the UI navbar |
MEALIE_INTERNAL_URL |
(empty) | Docker-internal Mealie URL (e.g. http://mealie:9000) to avoid Cloudflare hairpin |
TANDOOR_INTERNAL_URL |
(empty) | Docker-internal Tandoor URL (e.g. http://tandoor:8080) to avoid Cloudflare hairpin |
Building
On the build server (kisfenyo@192.168.0.180):
cd ~/build/recipe-importer
./build.sh X.X.X --push
Web UI
The UI is in Hungarian and uses a dark theme. The workflow is:
- Settings (
/settings) — Configure Mealie and/or Tandoor connection (URL + API key), test each connection - Import (
/import) — Paste a recipe URL, click "Beolvasás" (Scrape) - Review — Edit structured ingredients (4-column: quantity, unit, food, note), add/remove ingredient groups, edit instructions, manage tags (add/remove/search existing)
- Send — Click "Importálás Mealie-be" and/or "Importálás Tandoor-ba" to push to your configured services
Tech Stack
- Runtime: Python 3.12 (slim)
- Web framework: Flask 3.1 + Gunicorn
- HTML parsing: BeautifulSoup 4 + lxml
- HTTP client: requests
- Container: ~60 MB image