v0.8.0: gastrohobbi.hu parser, fix ingredient fraction parsing

Add gastrohobbi.hu parser (WPBakery page builder layout): ingredients
with groups, instructions with embedded lists, tags from JSON-LD
articleSection, prep time extraction.

Fix ingredient line parser: fractions like "1/2" no longer split due to
regex backtracking, en-dash ranges normalized, unicode fractions (½¼¾)
recognized as quantity start across all parsers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-24 19:17:13 +01:00
parent ba5dae2caa
commit 0ec9ce0c6d
3 changed files with 197 additions and 5 deletions
+14
View File
@@ -45,6 +45,7 @@ Docker container for importing recipes from Hungarian websites into [Mealie](htt
| nosalty.hu | Yes (with groups) | Yes (with section headers) | Yes | Yes |
| sobors.hu | Yes (with groups) | Yes (with section headers, follows linked recipes) | Yes | Yes |
| kiskegyed.hu | Yes (with groups, dual measurements) | Yes (follows sobors.hu links) | Yes | Yes |
| gastrohobbi.hu | Yes (with groups) | Yes (with embedded lists) | Yes | Yes (from JSON-LD categories) |
| *Other sites* | Fallback (schema.org JSON-LD) | Fallback (schema.org JSON-LD) | Yes (og:image) | Fallback (schema.org keywords) |
### Mindmegette.hu Parser
@@ -111,6 +112,19 @@ Extracts data from kiskegyed.hu recipe pages:
- **Cross-site links**: Pages linking to sobors.hu are followed to get the full recipe
- **Tags**: `section.tags > a > span` (# prefix stripped, "recept" filtered)
### GastroHobbi.hu Parser
Extracts data from gastrohobbi.hu recipe pages (WPBakery page builder layout):
- **Title**: `h1.mpcth-post-title > span.mpcth-color-main-border`
- **Description**: First `<p>` in the first `wpb_text_column` before the recipe columns; falls back to `og:description`
- **Image**: `og:image` meta tag
- **Ingredients**: Finds `h3` containing "Hozzávalók:", then walks sibling `<ul>` elements; items from `li > p` or `li` directly
- **Ingredient groups**: Plain `<h3>` elements between ingredient lists (e.g. "A csipetkéhez:")
- **Instructions**: `<p>` elements following the "Elkészítés:" `h3`; embedded `<ul>` items rendered as bullet points
- **Prep time**: Extracted from "Elkészítési idő:" `h3`, appended to description
- **Tags**: JSON-LD `Article.articleSection` array (site uses Article schema, not Recipe)
### Generic Fallback Parser
For unsupported sites, attempts extraction via: