v0.8.3: prefer h1 for mindmegette title, strip trailing "recept" globally
Mindmegette regular pages: use h1 element (clean meal name like "Sajtkrémes csirkés leves") instead of og:title (which has "receptje" suffix). Also add global post-processing to strip trailing recept/ receptje/receptek from titles across all parsers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,11 @@
|
||||
# Changelog
|
||||
|
||||
## v0.8.3 (2026-02-24)
|
||||
|
||||
### Fixed
|
||||
- Mindmegette.hu: prefer `<h1>` element for title (clean meal name) over og:title (which often has "receptje" suffix)
|
||||
- Global: strip trailing "recept"/"receptje" etc. from recipe titles across all parsers
|
||||
|
||||
## v0.8.2 (2026-02-24)
|
||||
|
||||
### Fixed
|
||||
|
||||
@@ -67,6 +67,12 @@ def scrape(url: str) -> dict:
|
||||
|
||||
# Post-process: extract parenthesized comments from food into extra
|
||||
_extract_ingredient_comments(result)
|
||||
|
||||
# Strip trailing "recept*" from title (e.g. "receptje", "recept")
|
||||
title = result.get("title", "")
|
||||
if title:
|
||||
result["title"] = re.sub(r"\s+recept\w*$", "", title, flags=re.IGNORECASE).strip()
|
||||
|
||||
return result
|
||||
|
||||
|
||||
@@ -90,6 +96,10 @@ def supported_sites() -> list[dict]:
|
||||
|
||||
@_register("mindmegette")
|
||||
def _parse_mindmegette(soup: BeautifulSoup, url: str) -> dict:
|
||||
# Prefer h1 (clean meal name) over og:title (often has "receptje" suffix)
|
||||
h1 = soup.find("h1")
|
||||
title = _text(h1) if h1 else ""
|
||||
if not title:
|
||||
title = _og(soup, "og:title") or _text(soup.find("title"))
|
||||
# Strip " | Mindmegette.hu" or " - Mindmegette.hu" suffix
|
||||
if title:
|
||||
|
||||
Reference in New Issue
Block a user