Skip to content

Improve Meetup event sync: include past events, API fallback, debug logging, and tests#19

Merged
carloshvp merged 2 commits intomainfrom
codex/find-workflow-for-updating-events-do8q1d
Apr 3, 2026
Merged

Improve Meetup event sync: include past events, API fallback, debug logging, and tests#19
carloshvp merged 2 commits intomainfrom
codex/find-workflow-for-updating-events-do8q1d

Conversation

@carloshvp
Copy link
Copy Markdown
Member

Motivation

  • Ensure the site has a more complete event history by also harvesting recent past events in addition to upcoming iCal entries.
  • Make parsing more robust across Meetup payload variants (escaped iCal text, JSON-LD @graph, and REST API responses) and surface diagnostics for debugging.
  • Allow safer CI usage by running the workflow on pull requests and providing a debug flag for verbose output.

Description

  • Workflow updates: the Actions workflow now triggers on pull_request and all pushes to main/master, exposes MEETUP_SYNC_DEBUG env, and tightens the commit step to only run for push, schedule, or workflow_dispatch runs.
  • Documentation updates: README.md documents new environment variables MEETUP_PAST_EVENTS_URL, MEETUP_EVENTS_API_URL, and MEETUP_SYNC_DEBUG, and clarifies the workflow triggers.
  • Script enhancements: scripts/sync_meetup_events.py adds debug logging via debug(), unescapes ical text (unescape_ical_text), supports JSON-LD @graph, extracts event URLs from HTML (extract_event_urls_from_html), fetches pages with fetch_url, parses Meetup REST payloads (parse_api_events), merges multiple event sources (merge_events), and implements a crawl + API fallback to populate past events; overall improvements to merging/deduping and more informative logs.
  • Tests: new tests/test_sync_meetup_events.py exercises extract_event_urls_from_html, parse_ld_json_events (including @graph), JSON-escaped URL handling, and parse_api_events parsing.

Testing

  • Ran unit tests with python -m unittest, executing tests/test_sync_meetup_events.py which covers URL extraction, JSON-LD graph parsing, escaped-URL extraction, and API payload parsing, and they passed.
  • Manual invocation of the script was used during development to validate debug output and merged event counts (observability added via MEETUP_SYNC_DEBUG), with expected logs emitted.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a40fb0371d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +235 to +236
key = event.get("meetup_url") or f"{event.get('title')}|{event.get('date')}"
merged[key] = event
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize meetup URLs before deduping merged events

merge_events dedupes by raw meetup_url, but this commit now merges multiple sources that commonly emit the same event URL in different forms (for example trailing slash vs no slash, or with tracking query params). In those cases the same event is kept multiple times, which inflates _data/events.json and can duplicate cards on the homepage when one source URL differs only syntactically. Canonicalizing URLs before keying (as extract_event_urls_from_html already does) avoids this regression.

Useful? React with 👍 / 👎.

Comment on lines +368 to +370
for event_url in extract_event_urls_from_html(payload)[:12]:
event_html = fetch_url(event_url, headers=headers, timeout=20)
detailed = parse_ld_json_events(event_html)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Continue crawling when one past-event page fetch fails

The per-event crawl runs inside a single outer try, so one failing fetch_url(event_url, ...) aborts the entire loop instead of skipping that URL and continuing. If at least one past event was already appended before the failure, past_events stays non-empty and the API fallback is skipped, leaving silently incomplete past history in the merged output. Catching fetch/parse errors per URL and continuing would prevent partial data loss from a single bad page.

Useful? React with 👍 / 👎.

@carloshvp carloshvp merged commit 2432774 into main Apr 3, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant