Skip to content

data scraper for Changelogs #2949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

Julow
Copy link
Contributor

@Julow Julow commented Feb 6, 2025

This adds a tool that detects missing changelogs by looking at projects' release pages. My intention is to write a GA that will notify maintainers (perhaps by opening draft PRs) for new releases.
Does that sound good ?

This will not help catching new releases in its current form but it can be used to add changelogs that are already missing. Here's its output:

Downloading https://github.com/ocaml/dune/releases.atom ... done 
Downloading https://github.com/tarides/dune-release/releases.atom ... done 
Downloading https://github.com/realworldocaml/mdx/releases.atom ... done 
No changelog entry for "mdx" version "2.5.0"

No changelog entry for "mdx" version "Release 0.3.3"

No changelog entry for "mdx" version "Release 0.3.2"

No changelog entry for "mdx" version "Release 0.3.1"

Downloading https://github.com/ocaml/merlin/releases.atom ... done 
No changelog entry for "merlin" version "5.4.1-503"

No changelog entry for "merlin" version "5.4-503"

No changelog entry for "merlin" version "4.18-414"

No changelog entry for "merlin" version "5.3-502"

No changelog entry for "merlin" version "4.17.1-414"

No changelog entry for "merlin" version "4.17.1-501"

No changelog entry for "merlin" version "5.2.1-502"

No changelog entry for "merlin" version "4.17-414"

No changelog entry for "merlin" version "4.17-501"

No changelog entry for "merlin" version "5.2-502"

Downloading https://github.com/ocaml/ocaml/releases.atom ... done 
No changelog entry for "ocaml" version "5.2.1-rc1"

Downloading https://github.com/ocaml/ocaml-lsp/releases.atom ... done 
No changelog entry for "ocaml-lsp" version "1.22.0"

No changelog entry for "ocaml-lsp" version "1.21.0"

No changelog entry for "ocaml-lsp" version "1.20.0"

Downloading https://github.com/ocaml-ppx/ocamlformat/releases.atom ... done 
Downloading https://github.com/OCamlPro/ocp-indent/releases.atom ... done 
No changelog entry for "ocp-indent" version "nlfork-1.5.5"

No changelog entry for "ocp-indent" version "1.8.2"

No changelog entry for "ocp-indent" version "nlfork-1.5.4"

No changelog entry for "ocp-indent" version "nlfork-1.5.3"

Downloading https://github.com/ocaml/odoc/releases.atom ... done 
No changelog entry for "odoc" version "3.0.0~beta1"

No changelog entry for "odoc" version "2.4.4"

No changelog entry for "odoc" version "2.4.3"

No changelog entry for "odoc" version "2.2.2"

Downloading https://github.com/ocaml-ppx/ocaml-migrate-parsetree/releases.atom ... d
one 
No changelog entry for "omp" version "v1.7.0"

Downloading https://github.com/ocaml/opam//releases.atom ... done 
Downloading https://github.com/ocaml-opam/opam-publish/releases.atom ... done 
No changelog entry for "opam-publish" version "2.5.0"

No changelog entry for "opam-publish" version "2.4.0"

No changelog entry for "opam-publish" version "2.3.1"

Downloading https://github.com/ocaml-ppx/ppxlib/releases.atom ... done 
No changelog entry for "ppxlib" version "0.34.0"

Downloading https://github.com/ocaml-community/utop/releases.atom ... done 
No changelog entry for "utop" version "Utop 2.15.0"

@@ -2,6 +2,7 @@
title: Release of OCaml 5.2.0
description: Release of OCaml 5.2.0
tags: [ocaml]
versions: ["OCaml 5.2.0"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The versions field is used when the release name doesn't match the changelog file name or to group several releases with one changelog (eg. beta releases)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the policy on prereleases? Do we want to advertise them or hide them? Aren't prerelease for the interested-in only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those should be hidden, as few people care about them and they are usually "in the know" anyways.

But perhaps some pre-releases deserve special treatment (e.g. OCaml, dune, opam, etc).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Julow any views on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find pre-releases quite uninteresting, except perhaps for OCaml and Opam, only when there are no official release that are more recent.
I think the changelog would be better with all the past pre-releases removed.

@@ -129,6 +129,8 @@ module Changelog = struct
body_html : string;
body : string;
authors : string list;
project_name : string;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

project_name is parsed from the slug.

@cuihtlauac
Copy link
Collaborator

Thanks @Julow, that's great.

@cuihtlauac
Copy link
Collaborator

@Julow can you rebase and update this? It was interesting, but we can't merge it as is.

Julow added 4 commits June 13, 2025 12:38
This field stores the project's name and version string parsed from the
file path and the slug.
This scraper doesn't generate changelog entries, instead it exits with
an error when one is missing.

The intention is to be notified when a changelog is missing.
Release names can be different on Github, which triggers false
positives.

The new 'versions' field in changelogs can be used to specify the
release name on Github and to associate several releases with one
changelog.
@Julow Julow force-pushed the changelog-check-scraper branch from bfc0ac8 to 6db8a54 Compare June 13, 2025 13:56
@Julow Julow marked this pull request as ready for review June 13, 2025 13:56
@Julow Julow requested a review from sabine as a code owner June 13, 2025 13:56
@Julow
Copy link
Contributor Author

Julow commented Jun 13, 2025

I just rebased the change. I couldn't figure out how to run this scraper automatically yet but merging now will help avoid this PR being stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants