Skip to content

data scraper for Changelogs #2949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions data/changelog/releases/dune/2024-06-17-dune.3.16.0.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Dune 3.16.0
tags: [dune, platform]
versions: [3.16.0, "3.16.0~alpha2", "3.16.0~alpha1"]
changelog: |
### Added

Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/dune/2024-11-27-dune.3.17.0.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Dune 3.17.0
tags: [dune, platform]
versions: [3.17.0, "3.17.0~alpha0"]
changelog: |
### Fixed

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: OCaml-LSP 1.20.1
tags: [ocaml-lsp, platform]
versions: ["1.20.1", "1.20.1-4.14", "1.20.0-4.14"]
changelog: |
## Features
- Add custom `ocamllsp/typeSearch` request ([#1369](https://github.com/ocaml/ocaml-lsp/pull/1369))
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
title: Release of OCaml 5.2.0
description: Release of OCaml 5.2.0
tags: [ocaml]
versions: ["OCaml 5.2.0"]
changelog: |

(Changes that can break existing programs are marked with a "*")
Expand Down Expand Up @@ -902,7 +903,7 @@
release of OCaml version 5.2.0.

Some of the highlights in OCaml 5.2.0 are:
- Reintroduced GC compaction

Check failure on line 906 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Lists should be surrounded by blank lines

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:906 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "- Reintroduced GC compaction"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md032.md
- Restored native backend for POWER 64 bits
- Thread sanitiser support
- New Dynarray module
Expand All @@ -912,17 +913,17 @@
- Local open in type expressions

And a lot of incremental changes:
- Around 20 new functions in the standard library

Check failure on line 916 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Lists should be surrounded by blank lines

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:916 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "- Around 20 new functions in t..."] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md032.md
- Many fixes and improvements in the runtime
- Many bug fixes

OCaml 5.2.0 is still a somewhat experimental release compared to the OCaml
4.14 branch. In particular:

- The Windows MSVC port is still unavailable.

Check failure on line 923 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list indentation

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:923:1 MD007/ul-indent Unordered list indentation [Expected: 0; Actual: 3] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md007.md
- Ephemeron performances need to be investigated.

Check failure on line 924 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list indentation

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:924:1 MD007/ul-indent Unordered list indentation [Expected: 0; Actual: 3] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md007.md
- `statmemprof` is being tested in the developer branch.

Check failure on line 925 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list indentation

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:925:1 MD007/ul-indent Unordered list indentation [Expected: 0; Actual: 3] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md007.md
- There are a number of known runtime concurrency or GC performance bugs

Check failure on line 926 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list indentation

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:926:1 MD007/ul-indent Unordered list indentation [Expected: 0; Actual: 3] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md007.md
(that trigger under rare circumstances).

Since the Windows MSVC port and `statmemprof` are still missing, the maintenance
Expand All @@ -932,7 +933,7 @@
and post any questions or comments you might have on our
[discussion forums](https://discuss.ocaml.org).


Check failure on line 936 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Multiple consecutive blank lines

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:936 MD012/no-multiple-blanks Multiple consecutive blank lines [Expected: 1; Actual: 2] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md012.md
The full list of changes can be found in the changelog below.

---
Expand All @@ -940,15 +941,15 @@
## Installation Instructions

The base compiler can be installed as an opam switch with the following commands:
```bash

Check failure on line 944 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Fenced code blocks should be surrounded by blank lines

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:944 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```bash"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md031.md
opam update
opam switch create 5.2.0
```

The source code for the release candidate is also directly available on:

* [GitHub](https://github.com/ocaml/ocaml/archive/5.2.0.tar.gz)

Check failure on line 951 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list style

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:951:1 MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md004.md
* [OCaml archives at Inria](https://caml.inria.fr/pub/distrib/ocaml-5.2/ocaml-5.2.0.tar.gz)

Check failure on line 952 in data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list style

data/changelog/releases/ocaml/2024-05-13-ocaml-5.2.0.md:952:1 MD004/ul-style Unordered list style [Expected: dash; Actual: asterisk] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md004.md

### Fine-Tuned Compiler Configuration

Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/ocaml/2024-11-18-ocaml-5.2.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
title: Release of OCaml 5.2.1
description: Release of OCaml 5.2.1
tags: [ocaml]
versions: ["OCaml 5.2.1"]
changelog: |

## Changes Since OCaml 5.2.0
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/ocaml/2025-01-08-ocaml-5.3.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
title: Release of OCaml 5.3.0
description: Release of OCaml 5.3.0
tags: [ocaml]
versions: ["OCaml 5.3.0"]
changelog: |
(Changes that can break existing programs are marked with a "*")
### Restored backend:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: OCamlFormat 0.24.1
tags: [ocamlformat, platform]
versions: ["0.24.0", "0.24.1"]
changelog: |
### New features

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: OCamlFormat 0.25.1
tags: [ocamlformat, platform]
versions: ["0.25.0", "0.25.1"]
changelog: |
### Library

Expand Down Expand Up @@ -171,4 +172,4 @@ The OCamlFormat team
+ else if Sys.unix then (module Unix)
+ else (module Fail)
+ : Unix_socket)
```
```
1 change: 1 addition & 0 deletions data/changelog/releases/omp/2020-04-15-omp-1.7.1.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Omp ocaml-migrate-parsetree-1.7.1
tags: [omp]
versions: ["v1.7.1"]
changelog: |
- Fix build with OCaml < 4.08

Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/omp/2020-04-20-omp-1.7.2.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Omp ocaml-migrate-parsetree-1.7.2
tags: [omp]
versions: ["v1.7.2"]
changelog: |
- Remove toplevel `Option` module accidentally added in 1.7.0
---
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/omp/2020-05-07-omp-1.7.3.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Omp ocaml-migrate-parsetree-1.7.3
tags: [omp]
versions: ["v1.7.3"]
changelog: |
- Fix magic numbers for the 4.11 ast (#96, @hhugo)
---
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/omp/2020-08-12-omp-2.0.0.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Omp ocaml-migrate-parsetree-2.0.0
tags: [omp]
versions: ["v2.0.0"]
changelog: |
- No longer expose the unwrapped modules (#94, @jonludlam)
- Remove everything but Ast versions and upgrade/downgrade
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/omp/2020-10-22-omp-2.1.0.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Omp ocaml-migrate-parsetree-2.1.0
tags: [omp]
versions: ["v2.1.0"]
changelog: |
- Add support for 4.12 (#107, @ceastlund)
---
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/omp/2020-10-23-omp-1.8.0.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Omp ocaml-migrate-parsetree-1.8.0
tags: [omp]
versions: ["v1.8.0"]
changelog: |
Oops, we went looking but didn't find the changelog for this release 🙈
---
1 change: 1 addition & 0 deletions data/changelog/releases/omp/2021-06-22-omp-2.2.0.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Omp ocaml-migrate-parsetree-2.2.0
tags: [omp]
versions: ["v2.2.0"]
changelog: |
- Add support for 4.13 (#114, @kit-ty-kate)
---
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Opam-publish 2.0.0
tags: [opam-publish, platform]
versions: ["2.0.0", "2.0.0: Merge pull request #66 from rjbou/push-on-master"]
changelog: |
* Switch default branch from 2.0.0 to master
* Minor fix
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/opam/2024-05-22-opam-2-1-6.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: opam 2.1.5
authors: [ "Raja Boujbel" ]
versions: ["2.1.6"]
description: "Release of opam 2.1.5"
tags: [opam, platform]
changelog: |
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/opam/2024-07-01-opam-2-2-0.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ authors: [
"Kate Deplaix",
"David Allsopp",
]
versions: ["2.2.0"]
description: "Release of opam 2.2.0"
tags: [opam, platform]
---
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/opam/2024-08-22-opam-2-2-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ authors: [
"Kate Deplaix",
"David Allsopp",
]
versions: ["2.2.1"]
description: "opam 2.2.1 release"
tags: [opam, platform]
---
Expand Down
1 change: 1 addition & 0 deletions data/changelog/releases/opam/2024-11-13-opam-2-3-0.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ authors: [
"Kate Deplaix",
"David Allsopp",
]
versions: ["2.3.0"]
description: "Release of opam 2.3.0"
tags: [opam, platform]
---
Expand Down
2 changes: 2 additions & 0 deletions src/ocamlorg_data/data_intf.ml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ module Changelog = struct
body : string;
authors : string list;
contributors : string list;
project_name : string;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

project_name is parsed from the slug.

versions : string list;
}

type post = {
Expand Down
7 changes: 6 additions & 1 deletion tool/ood-gen/bin/scrape.ml
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
open Cmdliner
open Ood_gen

let term_scrapers = [ ("planet", Blog.Scraper.scrape); ("video", Video.scrape) ]
let term_scrapers =
[
("planet", Blog.Scraper.scrape);
("video", Video.scrape);
("changelog", Changelog.Scraper.scrape);
]

let cmds =
Cmd.group (Cmd.info "ood-scrape")
Expand Down
118 changes: 105 additions & 13 deletions tool/ood-gen/lib/changelog.ml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,37 @@ type t = [%import: Data_intf.Changelog.t] [@@deriving of_yaml, show]
a pull request to mirror a release announcement that likely already happened
on discuss.ocaml.org. *)

let re_date_slug =
(** A scraper is provided to check whether changelog entries are missing. Run it
like this:
{v
dune exec -- tool/ood-gen/bin/scrape.exe changelog
v}
The list below describes how to query the latest releases. *)
let projects_release_feeds =
[
("ocamlformat", `Github "https://github.com/ocaml-ppx/ocamlformat");
("dune", `Github "https://github.com/ocaml/dune");
("dune-release", `Github "https://github.com/tarides/dune-release");
("mdx", `Github "https://github.com/realworldocaml/mdx");
("merlin", `Github "https://github.com/ocaml/merlin");
("ocaml", `Github "https://github.com/ocaml/ocaml");
("ocaml-lsp", `Github "https://github.com/ocaml/ocaml-lsp");
("ocp-indent", `Github "https://github.com/OCamlPro/ocp-indent");
("odoc", `Github "https://github.com/ocaml/odoc");
("opam", `Github "https://github.com/ocaml/opam/");
("opam-publish", `Github "https://github.com/ocaml-opam/opam-publish");
("ppxlib", `Github "https://github.com/ocaml-ppx/ppxlib");
("utop", `Github "https://github.com/ocaml-community/utop");
("omp", `Github "https://github.com/ocaml-ppx/ocaml-migrate-parsetree");
]

let re_slug =
let open Re in
let re_project_name =
let w = rep1 alpha in
seq [ w; rep (seq [ char '-'; w ]) ]
in
let re_version_string = seq [ digit; rep1 any ] in
compile
(seq
[
Expand All @@ -56,17 +85,21 @@ let re_date_slug =
group (rep1 digit);
];
char '-';
opt
(seq
[ group re_project_name; set "-."; group re_version_string; eos ]);
])

let parse_date_from_slug s =
match Re.exec_opt re_date_slug s with
let parse_slug s =
match Re.exec_opt re_slug s with
| None -> None
| Some g ->
let int n = Re.Group.get g n |> int_of_string in
let year = int 1 in
let month = int 2 in
let day = int 3 in
Some (Printf.sprintf "%04d-%02d-%02d" year month day)
let version = Re.Group.get_opt g 5 in
Some (Printf.sprintf "%04d-%02d-%02d" year month day, version)

module Releases = struct
type release_metadata = {
Expand All @@ -76,18 +109,21 @@ module Releases = struct
contributors : string list option;
description : string option;
changelog : string option;
versions : string list option;
}
[@@deriving
of_yaml,
stable_record ~version:release ~remove:[ changelog; description ]
~modify:[ authors; contributors ]
~add:[ slug; changelog_html; body_html; body; date ]]
~modify:[ authors; contributors; versions ]
~add:[ slug; changelog_html; body_html; body; date; project_name ]]

let of_release_metadata m =
release_metadata_to_release m ~modify_authors:(Option.value ~default:[])
~modify_contributors:(Option.value ~default:[])
~modify_versions:(Option.value ~default:[])

let decode (fname, (head, body)) =
let project_name = Filename.basename (Filename.dirname fname) in
let slug = Filename.basename (Filename.remove_extension fname) in
let metadata =
release_metadata_of_yaml head |> Result.map_error (Utils.where fname)
Expand All @@ -109,16 +145,21 @@ module Releases = struct
|> Hilite.Md.transform
|> Cmarkit_html.of_doc ~safe:false)
in
let date =
match parse_date_from_slug slug with
let date, slug_version =
match parse_slug slug with
| Some x -> x
| None ->
failwith
"date is not present in metadata and could not be parsed from \
slug"
("date is not present in metadata and could not be parsed from \
slug: " ^ slug)
in
let metadata =
match (metadata.versions, slug_version) with
| None, Some v -> { metadata with versions = Some [ v ] }
| _ -> metadata
in
of_release_metadata ~slug ~changelog_html ~body ~body_html ~date
metadata)
~project_name metadata)
metadata

let all () =
Expand Down Expand Up @@ -154,8 +195,8 @@ module Posts = struct
Result.map
(fun metadata ->
let date =
match parse_date_from_slug slug with
| Some x -> x
match parse_slug slug with
| Some (date, _) -> date
| None ->
failwith
"date is not present in metadata and could not be parsed from \
Expand Down Expand Up @@ -233,3 +274,54 @@ include Data_intf.Changelog
let all = %a
|ocaml}
(Fmt.Dump.list pp) (all ())

module Scraper = struct
module SMap = Map.Make (String)
module SSet = Set.Make (String)

let warning_count = ref 0

let warn fmt =
let flush out =
Printf.fprintf out "\n%!";
incr warning_count
in
Printf.kfprintf flush stderr fmt

let fetch_github repo =
[ River.fetch { River.name = repo; url = repo ^ "/releases.atom" } ]
|> River.posts
|> List.map (fun post -> River.title post)

let group_releases_by_project all =
List.fold_left
(fun acc t ->
List.fold_left
(fun acc v -> SMap.add_to_list t.project_name v acc)
acc t.versions)
SMap.empty all

let check_if_uptodate project known_versions =
let known_versions = SSet.of_list known_versions in
let check scraped_versions =
List.iter
(fun v ->
if not (SSet.mem v known_versions) then
warn "No changelog entry for %S version %S\n%!" project v)
scraped_versions
in
match List.assoc_opt project projects_release_feeds with
| Some (`Github repo) -> check (fetch_github repo)
| None ->
warn
"Don't know how to lookup project %S. Please update \
'tool/ood-gen/lib/changelog.ml'\n\
%!"
project

(** This does not generate any file. Instead, it exits with an error if a
changelog entry is missing. *)
let scrape () =
Releases.all () |> group_releases_by_project |> SMap.iter check_if_uptodate;
if !warning_count > 0 then exit 1
end
17 changes: 13 additions & 4 deletions tool/ood-gen/test/test_ood_gen.ml
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,23 @@ let test_parse_date_from_slug =
( name,
`Quick,
fun () ->
let got = Ood_gen.Changelog.parse_date_from_slug s in
Alcotest.check (Alcotest.option Alcotest.string) __LOC__ expected got )
let got = Ood_gen.Changelog.parse_slug s in
Alcotest.(check (option (pair string (option string))))
__LOC__ expected got )
in
[
test ~name:"ok" "2020-03-02-something.md" ~expected:(Some "2020-03-02");
test ~name:"ok" "2020-03-02-something.md"
~expected:(Some ("2020-03-02", None));
test ~name:"no date" "something.md" ~expected:None;
test ~name:"day not padded correctly" "2021-1-2-title.md"
~expected:(Some "2021-01-02");
~expected:(Some ("2021-01-02", None));
test ~name:"ok with project-version" "2025-01-31-project-1.2.3"
~expected:(Some ("2025-01-31", Some "1.2.3"));
test ~name:"ok with project.version" "2025-01-31-project.1.2.3"
~expected:(Some ("2025-01-31", Some "1.2.3"));
test ~name:"ok with project.version-suffix"
"2025-01-31-project.1.2.3-something"
~expected:(Some ("2025-01-31", Some "1.2.3-something"));
]

let tests = [ ("parse_date_from_slug", test_parse_date_from_slug) ]
Expand Down
Loading