Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a40fb0371d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| key = event.get("meetup_url") or f"{event.get('title')}|{event.get('date')}" | ||
| merged[key] = event |
There was a problem hiding this comment.
Normalize meetup URLs before deduping merged events
merge_events dedupes by raw meetup_url, but this commit now merges multiple sources that commonly emit the same event URL in different forms (for example trailing slash vs no slash, or with tracking query params). In those cases the same event is kept multiple times, which inflates _data/events.json and can duplicate cards on the homepage when one source URL differs only syntactically. Canonicalizing URLs before keying (as extract_event_urls_from_html already does) avoids this regression.
Useful? React with 👍 / 👎.
| for event_url in extract_event_urls_from_html(payload)[:12]: | ||
| event_html = fetch_url(event_url, headers=headers, timeout=20) | ||
| detailed = parse_ld_json_events(event_html) |
There was a problem hiding this comment.
Continue crawling when one past-event page fetch fails
The per-event crawl runs inside a single outer try, so one failing fetch_url(event_url, ...) aborts the entire loop instead of skipping that URL and continuing. If at least one past event was already appended before the failure, past_events stays non-empty and the API fallback is skipped, leaving silently incomplete past history in the merged output. Catching fetch/parse errors per URL and continuing would prevent partial data loss from a single bad page.
Useful? React with 👍 / 👎.
Motivation
@graph, and REST API responses) and surface diagnostics for debugging.Description
pull_requestand all pushes tomain/master, exposesMEETUP_SYNC_DEBUGenv, and tightens the commit step to only run forpush,schedule, orworkflow_dispatchruns.README.mddocuments new environment variablesMEETUP_PAST_EVENTS_URL,MEETUP_EVENTS_API_URL, andMEETUP_SYNC_DEBUG, and clarifies the workflow triggers.scripts/sync_meetup_events.pyadds debug logging viadebug(), unescapes ical text (unescape_ical_text), supports JSON-LD@graph, extracts event URLs from HTML (extract_event_urls_from_html), fetches pages withfetch_url, parses Meetup REST payloads (parse_api_events), merges multiple event sources (merge_events), and implements a crawl + API fallback to populate past events; overall improvements to merging/deduping and more informative logs.tests/test_sync_meetup_events.pyexercisesextract_event_urls_from_html,parse_ld_json_events(including@graph), JSON-escaped URL handling, andparse_api_eventsparsing.Testing
python -m unittest, executingtests/test_sync_meetup_events.pywhich covers URL extraction, JSON-LD graph parsing, escaped-URL extraction, and API payload parsing, and they passed.MEETUP_SYNC_DEBUG), with expected logs emitted.Codex Task