Skip to content

feat(VIAF): add cluster refresh pipeline#203

Open
rerowep wants to merge 1 commit intorero:stagingfrom
rerowep:wep-viaf-updates
Open

feat(VIAF): add cluster refresh pipeline#203
rerowep wants to merge 1 commit intorero:stagingfrom
rerowep:wep-viaf-updates

Conversation

@rerowep
Copy link
Contributor

@rerowep rerowep commented Feb 22, 2026

Pull-based VIAF update using the JSON API since
the OAI-PMH endpoint is non-functional. Adds CLI
command harvest-viaf with rolling, missing and full refresh modes. Detects and handles VIAF cluster
merges (redirects) with automatic MEF cleanup.

Summary by CodeRabbit

  • New Features
    • VIAF harvest/refresh (batch, rolling, single-PID) with enqueue/progress options, unlinked-agent discovery/relinking, redirect migration, and queue-aware wait.
  • Documentation
    • Added AI assistant configuration and comprehensive coding guidelines.
  • Bug Fixes
    • Improved VIAF rate-limit retry behavior, logging deduplication, monitoring/reporting edge cases, identifier formatting, and safer CSV/VIAF validations.
  • Tests
    • Large expansion of UI/unit tests covering VIAF flows, redirects, CLI, tasks, and MARC classification.

@coveralls
Copy link
Collaborator

coveralls commented Feb 22, 2026

Coverage Status

coverage: 87.105% (+0.5%) from 86.621%
when pulling 386c6eb on rerowep:wep-viaf-updates
into 9b291a0 on rero:staging.

@rerowep rerowep force-pushed the wep-viaf-updates branch 2 times, most recently from c89accf to af4511b Compare February 22, 2026 18:52
@rerowep rerowep marked this pull request as ready for review February 22, 2026 18:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (12)
tests/ui/places/test_places_api.py (1)

261-261: app fixture is not needed in test_places_utils_make_identifier.

make_identifier operates only on its argument and never touches current_app, so the app fixture serves no purpose here and Ruff's ARG001 warning is valid.

♻️ Proposed fix
-def test_places_utils_make_identifier(app):
+def test_places_utils_make_identifier():
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/places/test_places_api.py` at line 261, The test function
test_places_utils_make_identifier includes an unused app fixture; remove the
unused parameter from the test signature so the test only accepts the input it
needs (i.e., drop the app argument) because make_identifier is pure and does not
use current_app; update any calls or references to
test_places_utils_make_identifier accordingly and run tests to confirm Ruff
ARG001 warning is resolved.
.github/copilot-instructions.md (2)

97-102: mock_response helper is undefined in the example — add an import or show its construction.

The snippet references mock_response(json_data=...) but neither imports it nor shows how it's constructed. AI assistants following this example literally will produce broken test code. Show the helper's origin (e.g., a conftest.py fixture) or replace it with a standard MagicMock / unittest.mock.Mock inline setup.

✏️ Suggested clarification
 `@mock.patch`("requests.Session.get")
 def test_api_call_success(mock_get, app):
     """Test API call with successful response."""
-    mock_get.return_value = mock_response(json_data={"result": "ok"})
-    # Test implementation
+    mock_response = mock.MagicMock()
+    mock_response.json.return_value = {"result": "ok"}
+    mock_response.status_code = 200
+    mock_get.return_value = mock_response
+    # Test implementation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/copilot-instructions.md around lines 97 - 102, The example uses an
undefined helper mock_response in the test_api_call_success function (and the
mock.patch on requests.Session.get); fix it by either importing/defining
mock_response (e.g., add a conftest.py fixture or helper function named
mock_response that returns a Mock with .json() and .status_code) or replace the
call inline with unittest.mock.MagicMock/Mock configured with the expected .json
return value and status_code; update the test_api_call_success reference to use
that defined helper or inline mock so the example is runnable.

170-180: result is referenced but never assigned in the task pattern snippet.

The placeholder return result will cause a NameError if copy-pasted verbatim. Replace with a comment clarifying the return shape.

✏️ Suggested clarification
-    # Task implementation
-    return result
+    # Task implementation
+    return {"count": 0}  # Return a summary dict or similar
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/copilot-instructions.md around lines 170 - 180, The snippet defines
my_task(param, dbcommit=True, reindex=True, verbose=False) but returns an
unassigned variable result; replace the literal return with a descriptive
comment describing the expected return shape (e.g., dict with keys 'status' and
'data', or specific type) so copy-pasting won’t raise NameError—update the
docstring or add an inline comment near the end of my_task to specify the return
value contract instead of return result.
CLAUDE.md (1)

28-28: Minor sync drift with .github/copilot-instructions.md — the equivalent line there uses bold (**Never use…**) while this line doesn't. Per AI_SETUP.md, both files should stay synchronized.

✏️ Suggested fix
-Never use `pip`, `python -m pytest`, or bare `pytest` — always go through `uv`.
+**Never use `pip`, `python -m pytest`, or bare `pytest`** — always go through `uv`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 28, Update CLAUDE.md so the sentence "Never use `pip`,
`python -m pytest`, or bare `pytest` — always go through `uv`." matches the
styling in .github/copilot-instructions.md by making it bold (wrap the entire
sentence in ** **); locate the exact text in CLAUDE.md and replace it with the
bolded version to keep both files synchronized per AI_SETUP.md.
rero_mef/agents/viaf/api.py (1)

391-433: handle_redirect looks solid; minor: prefix unused unpacked variables with _.

The redirect handling logic is well-structured — chained-redirect protection (line 415) and the delete-then-create/update flow are correct. Two small observations:

  1. Line 427: action from create_or_update is never used since line 433 always returns Action.REDIRECT. Prefix with _ to signal intent.
  2. Line 426: target_data.pop("redirect_from", None) is a no-op here (if redirect_from were present, we'd have returned Action.ERROR on line 420). Harmless defensive code, but a comment would clarify.
Suggested fix
-        new_record, action = self.create_or_update(
+        new_record, _action = self.create_or_update(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 391 - 433, In handle_redirect, mark
the unused unpacked variable from create_or_update by renaming action to _action
(or prefix it with an underscore) to indicate it’s intentionally unused, and add
a brief inline comment next to the target_data.pop("redirect_from", None) call
in handle_redirect explaining it’s defensive (kept to ensure no stray
redirect_from key is persisted even though earlier code returns on chained
redirects).
tests/ui/agents/test_agents_viaf_api.py (2)

437-469: Chained redirect test is correct; minor nit on unused variable.

The test correctly verifies that a chained redirect (target itself redirects again) returns Action.ERROR. Line 461: redirect_info is unused — prefix with _ per convention.

Suggested fix
-    new_record, action, redirect_info = agent_viaf_record.handle_redirect(
+    new_record, action, _redirect_info = agent_viaf_record.handle_redirect(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` around lines 437 - 469, In
test_handle_redirect_chained, the local variable redirect_info returned from
AgentViafRecord.handle_redirect is unused; update the call to capture it into a
discard variable (e.g., _redirect_info) so the test follows naming conventions:
change the tuple assignment from "new_record, action, redirect_info =
agent_viaf_record.handle_redirect(...)" to use "_redirect_info" instead; leave
all assertions and other variable names unchanged.

452-458: The "redirect_from": new_pid in mock data is a red herring.

get_online_record builds the result dict from scratch — it never copies redirect_from from the raw API JSON. The chained redirect is detected because viafID ("33333333") ≠ requested pid ("22222222"), causing get_online_record to set its own redirect_from. The extra field in the mock is harmless but potentially misleading for future readers.

Cleaner mock data
     chained_data = {
         "viafID": "33333333",
-        "redirect_from": new_pid,
         "sources": {},
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` around lines 452 - 458, The mock
response is misleading because get_online_record detects redirects by comparing
viafID to the requested pid rather than reading redirect_from; update the test
by removing "redirect_from": new_pid from the chained_data mock so chained_data
contains only the differing "viafID" ("33333333") and any minimal required
fields (e.g., "sources"), leaving mock_get.return_value =
mock_response(json_data=chained_data) and keeping the comparison behavior in
get_online_record unchanged.
rero_mef/agents/cli.py (2)

277-292: Move the import of _refresh_viaf_record outside the loop, and reconsider importing a private function.

The from .viaf.tasks import _refresh_viaf_record on line 285 is executed on every iteration when not enqueueing. Python caches module imports so the performance hit is negligible, but it's unconventional and hurts readability. Additionally, _refresh_viaf_record is a private function (leading underscore) — consider either making it public or using the already-imported viaf_get_record wrapper instead.

Suggested fix — use the public `viaf_get_record` directly
         for pid in progress_bar:
             if batch_size and count >= batch_size:
                 break
             if enqueue:
                 task = viaf_get_record.delay(viaf_pid=pid, verbose=verbose)
                 if verbose:
                     click.echo(f"  viaf pid: {pid} task: {task}")
             else:
-                from .viaf.tasks import _refresh_viaf_record
-
-                action = _refresh_viaf_record(
-                    pid=pid, dbcommit=True, reindex=True, verbose=verbose
-                )
-                action_counts.setdefault(action.value, 0)
-                action_counts[action.value] += 1
+                action_value = viaf_get_record(
+                    viaf_pid=pid, dbcommit=True, reindex=True, verbose=verbose
+                )
+                action_counts.setdefault(action_value, 0)
+                action_counts[action_value] += 1
             count += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 277 - 292, The loop currently imports
the private function _refresh_viaf_record on every iteration and uses a
leading-underscore symbol; move the import out of the loop (e.g., import
_refresh_viaf_record once before iterating over progress_bar) and replace the
private call with the public wrapper viaf_get_record if possible (or promote
_refresh_viaf_record to a public name) so readability and convention are
preserved; update usages of action/action_counts and enqueue logic to call the
chosen public function consistently.

221-232: --rolling and --missing are silently mutually exclusive.

If a user passes both --rolling and --missing, only rolling runs (due to if/elif). Consider using a click.Choice for mode selection, or add validation to warn/error when both are specified.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 221 - 232, The two boolean click options
"rolling" and "missing" are currently silently mutually exclusive (the handler
uses if/elif), so update the CLI to enforce a single mode: either replace the
two flags with a single choice option (e.g., "--mode" with click.Choice values
like "rolling" and "missing") or add explicit validation at the start of the
command handler that checks the boolean flags rolling and missing and raises
click.UsageError (or prints/warns) if both are True; ensure the error message
clearly describes the conflict and tells the user to choose one mode.
rero_mef/agents/viaf/tasks.py (3)

51-51: Prefix unused unpacked variables with _.

new_record and redirect_info from handle_redirect are unused here.

Suggested fix
-            new_record, action, redirect_info = existing_record.handle_redirect(
+            _new_record, action, _redirect_info = existing_record.handle_redirect(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` at line 51, The tuple returned by
existing_record.handle_redirect is unpacked into new_record, action,
redirect_info but new_record and redirect_info are unused; update the unpacking
to prefix unused variables with an underscore (e.g., _new_record, action,
_redirect_info) so only the used symbol action remains non-underscored, keeping
existing_record.handle_redirect and the assignment structure unchanged.

36-74: No error handling for network failures in _refresh_viaf_record.

If get_online_record raises an unhandled exception (e.g., ConnectionError, Timeout after retries are exhausted), the entire task crashes. For a batch pipeline processing potentially thousands of records, consider catching and logging transient errors to avoid aborting the entire run.

Sketch
+    try:
         online_data, msg = AgentViafRecord.get_online_record(
             viaf_source_code="VIAF", pid=pid
         )
+    except Exception as err:
+        current_app.logger.error(f"VIAF refresh error for {pid}: {err}")
+        return Action.ERROR
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 36 - 74, Wrap the call to
AgentViafRecord.get_online_record in a try/except inside _refresh_viaf_record to
catch transient network/errors (e.g., requests.exceptions.RequestException or a
broad Exception), log the exception via click.echo when verbose (including the
pid and exception text) and return a non-fatal Action (e.g., Action.SKIP)
instead of letting the exception bubble and crash the task; keep existing
behavior for not-found and redirect flows and only alter the control flow when
get_online_record raises.

64-71: Redundant add_md5 call on the UPDATE path — consider consolidating MD5 computation logic.

Line 65 pre-computes add_md5(online_data), but for the UPDATE path (when test_md5=True), update_md5_changed() calls add_md5() again at line 296. However, for the CREATE path, this pre-computation is necessary since create_or_update() calls cls.create() without passing md5=True, and EntityRecord.create() defaults to md5=False.

Either:

  1. Remove the pre-computation and modify create_or_update() to pass md5=True when calling create(), or
  2. Keep the pre-computation but add a comment clarifying it ensures MD5 is present for new records (CREATE path), acknowledging the redundancy on UPDATE.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 64 - 71, Pre-computation of MD5
via add_md5(online_data) is redundant for the UPDATE path because
update_md5_changed() calls add_md5() again; change the logic so MD5 is computed
only where needed by removing the add_md5(online_data) call here and instead
update AgentViafRecord.create_or_update to pass md5=True when it invokes
create() (EntityRecord.create), ensuring new records still get an MD5 while
UPDATE paths keep update_md5_changed() and its add_md5() call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/continuous-integration-test.yml:
- Line 77: The workflow sets carryforward: true which is invalid because
carryforward expects comma-separated flag names and only applies with parallel:
true and a parallel-finished job; remove the carryforward entry from the job
configuration (or alternatively implement a proper parallel matrix with per-leg
flag-name values and a parallel-finished job with parallel-finished: true) —
locate the carryforward key in the Coveralls/upload or coverage job and delete
it unless you intentionally convert the workflow to use parallel +
parallel-finished and explicit flag names.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 47-62: The initial use of the walrus operator removes
redirect_from from online_data early, making the later
online_data.pop("redirect_from", None) dead; change the first occurrence to
retrieve (not remove) the value — e.g., use online_data.get("redirect_from") in
the conditional that assigns redirect_from so AgentViafRecord.get_record_by_pid
and existing_record.handle_redirect can still access/modify online_data, and
preserve the final online_data.pop("redirect_from", None) to remove the key only
when treating it as a new record.
- Around line 117-163: Replace the unclosed scroll by using a PIT context: in
process_viaf_rolling_refresh wrap the iteration over AgentViafSearch() in a with
query.point_in_time(): block (keep the existing .params(preserve_order=True) and
.source("pid")), then iterate the search (using .scan() or the iterator
provided) inside that context and perform the _refresh_viaf_record calls as
before; this ensures the point-in-time is cleaned up even when you break early
(reference: process_viaf_rolling_refresh, AgentViafSearch, scan(),
point_in_time(), _refresh_viaf_record).

In `@rero_mef/config.py`:
- Around line 222-223: Remove the dead configuration constant
RERO_MEF_VIAF_REFRESH_DAYS_SPAN by deleting its definition (the
RERO_MEF_VIAF_REFRESH_DAYS_SPAN = 180 line) from the config module and ensure
nothing else depends on it (keep RERO_MEF_VIAF_VIAF_REFRESH_BATCH_SIZE intact);
run tests/grep to confirm there are no remaining references and delete any
related unused imports or documentation comments if found.

In `@tests/ui/agents/test_agents_cli.py`:
- Around line 131-133: The combined assertion masks whether the task was
actually enqueued; replace the single assertion `assert mock_delay.call_count >=
1 or "task" in res.output.lower()` with two assertions: one that directly checks
the mock was invoked (`assert mock_delay.call_count >= 1`) and a separate one
that validates the CLI output (`assert "task" in res.output.lower()` or a more
specific substring), updating the test in tests/ui/agents/test_agents_cli.py
around where `mock_delay` and `res.output` are asserted (refer to
`mock_delay.call_count` and `res.output`) so each behavior is verified
independently.
- Around line 162-164: The current assertion uses a weak "or" with res.output;
replace it with a strict check that the task was called exactly twice by
asserting mock_task.call_count == 2 (i.e., change the line "assert
mock_task.call_count == 2 or 'VIAF file' in res.output" to "assert
mock_task.call_count == 2") so the test verifies the file with two records
triggers two task invocations (referencing mock_task.call_count and res.output
in the failing assertion).
- Around line 106-109: The assertion using mock_task.call_count >= 0 is
meaningless; change it to assert that the mock was actually called (e.g., assert
mock_task.call_count >= 1 or assert mock_task.called) so the test verifies tasks
were invoked for missing VIAF PIDs; update the assertion in
tests/ui/agents/test_agents_cli.py where mock_task is checked (refer to the
mock_task variable and the surrounding assertions checking res.output) to
require a positive call count or use
mock_task.assert_called()/assert_called_once() as appropriate.

In `@tests/ui/places/test_places_api.py`:
- Around line 275-277: The test's expected value uses a colon but
make_identifier (in rero_mef/places/utils.py) returns a pipe when no source is
present; update the assertion in the test (the identified_by_without_source /
identifier check in test_places_api.py) to expect "bf:Isbn|978-3-16-148410-0"
(i.e., use '|' as the separator) so the test matches the make_identifier
behavior, or alternatively change make_identifier to use ':' consistently if you
want the colon—adjust only one side to make them consistent.
- Line 258: The assertion is tautological because "len(endpoints) >= 0" is
always true; update the test in tests/ui/places/test_places_api.py to perform a
meaningful check on the endpoints variable: either assert "plgnd" in endpoints
or allow an empty mapping by replacing the second operand with "len(endpoints)
== 0" (if empty is acceptable), or if the intent is only to verify type, replace
the whole assertion with "assert isinstance(endpoints, dict)"; locate the
failing line that currently reads "assert 'plgnd' in endpoints or len(endpoints)
>= 0" and change it to one of these correct alternatives depending on desired
behavior.

---

Duplicate comments:
In `@CLAUDE.md`:
- Around line 97-102: Update test_api_call_success to use the same mock_response
helper implementation and explicit return-value assertions as used in the other
file: ensure the mock_response helper referenced by mock_response(...) returns
an object with a .json() method and status_code (so mock_get.return_value =
mock_response(json_data={"result":"ok"}) behaves the same), import or define the
mock_response helper in the test module if missing, and replace any vague
"return result" checks with explicit assertions on the API call's return value
inside test_api_call_success to keep behavior synchronized with the other tests
and the shared helper.

---

Nitpick comments:
In @.github/copilot-instructions.md:
- Around line 97-102: The example uses an undefined helper mock_response in the
test_api_call_success function (and the mock.patch on requests.Session.get); fix
it by either importing/defining mock_response (e.g., add a conftest.py fixture
or helper function named mock_response that returns a Mock with .json() and
.status_code) or replace the call inline with unittest.mock.MagicMock/Mock
configured with the expected .json return value and status_code; update the
test_api_call_success reference to use that defined helper or inline mock so the
example is runnable.
- Around line 170-180: The snippet defines my_task(param, dbcommit=True,
reindex=True, verbose=False) but returns an unassigned variable result; replace
the literal return with a descriptive comment describing the expected return
shape (e.g., dict with keys 'status' and 'data', or specific type) so
copy-pasting won’t raise NameError—update the docstring or add an inline comment
near the end of my_task to specify the return value contract instead of return
result.

In `@CLAUDE.md`:
- Line 28: Update CLAUDE.md so the sentence "Never use `pip`, `python -m
pytest`, or bare `pytest` — always go through `uv`." matches the styling in
.github/copilot-instructions.md by making it bold (wrap the entire sentence in
** **); locate the exact text in CLAUDE.md and replace it with the bolded
version to keep both files synchronized per AI_SETUP.md.

In `@rero_mef/agents/cli.py`:
- Around line 277-292: The loop currently imports the private function
_refresh_viaf_record on every iteration and uses a leading-underscore symbol;
move the import out of the loop (e.g., import _refresh_viaf_record once before
iterating over progress_bar) and replace the private call with the public
wrapper viaf_get_record if possible (or promote _refresh_viaf_record to a public
name) so readability and convention are preserved; update usages of
action/action_counts and enqueue logic to call the chosen public function
consistently.
- Around line 221-232: The two boolean click options "rolling" and "missing" are
currently silently mutually exclusive (the handler uses if/elif), so update the
CLI to enforce a single mode: either replace the two flags with a single choice
option (e.g., "--mode" with click.Choice values like "rolling" and "missing") or
add explicit validation at the start of the command handler that checks the
boolean flags rolling and missing and raises click.UsageError (or prints/warns)
if both are True; ensure the error message clearly describes the conflict and
tells the user to choose one mode.

In `@rero_mef/agents/viaf/api.py`:
- Around line 391-433: In handle_redirect, mark the unused unpacked variable
from create_or_update by renaming action to _action (or prefix it with an
underscore) to indicate it’s intentionally unused, and add a brief inline
comment next to the target_data.pop("redirect_from", None) call in
handle_redirect explaining it’s defensive (kept to ensure no stray redirect_from
key is persisted even though earlier code returns on chained redirects).

In `@rero_mef/agents/viaf/tasks.py`:
- Line 51: The tuple returned by existing_record.handle_redirect is unpacked
into new_record, action, redirect_info but new_record and redirect_info are
unused; update the unpacking to prefix unused variables with an underscore
(e.g., _new_record, action, _redirect_info) so only the used symbol action
remains non-underscored, keeping existing_record.handle_redirect and the
assignment structure unchanged.
- Around line 36-74: Wrap the call to AgentViafRecord.get_online_record in a
try/except inside _refresh_viaf_record to catch transient network/errors (e.g.,
requests.exceptions.RequestException or a broad Exception), log the exception
via click.echo when verbose (including the pid and exception text) and return a
non-fatal Action (e.g., Action.SKIP) instead of letting the exception bubble and
crash the task; keep existing behavior for not-found and redirect flows and only
alter the control flow when get_online_record raises.
- Around line 64-71: Pre-computation of MD5 via add_md5(online_data) is
redundant for the UPDATE path because update_md5_changed() calls add_md5()
again; change the logic so MD5 is computed only where needed by removing the
add_md5(online_data) call here and instead update
AgentViafRecord.create_or_update to pass md5=True when it invokes create()
(EntityRecord.create), ensuring new records still get an MD5 while UPDATE paths
keep update_md5_changed() and its add_md5() call.

In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 437-469: In test_handle_redirect_chained, the local variable
redirect_info returned from AgentViafRecord.handle_redirect is unused; update
the call to capture it into a discard variable (e.g., _redirect_info) so the
test follows naming conventions: change the tuple assignment from "new_record,
action, redirect_info = agent_viaf_record.handle_redirect(...)" to use
"_redirect_info" instead; leave all assertions and other variable names
unchanged.
- Around line 452-458: The mock response is misleading because get_online_record
detects redirects by comparing viafID to the requested pid rather than reading
redirect_from; update the test by removing "redirect_from": new_pid from the
chained_data mock so chained_data contains only the differing "viafID"
("33333333") and any minimal required fields (e.g., "sources"), leaving
mock_get.return_value = mock_response(json_data=chained_data) and keeping the
comparison behavior in get_online_record unchanged.

In `@tests/ui/places/test_places_api.py`:
- Line 261: The test function test_places_utils_make_identifier includes an
unused app fixture; remove the unused parameter from the test signature so the
test only accepts the input it needs (i.e., drop the app argument) because
make_identifier is pure and does not use current_app; update any calls or
references to test_places_utils_make_identifier accordingly and run tests to
confirm Ruff ARG001 warning is resolved.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
rero_mef/agents/viaf/tasks.py (1)

142-163: Prefer PIT/iterate to avoid lingering scrolls when breaking early.

scan() uses the Scroll API; breaking at Line 154 may leave scroll contexts until TTL. Consider point_in_time() + iterate() to ensure cleanup.

elasticsearch-dsl Search.point_in_time context manager iterate vs scan preserve_order
♻️ Possible refactor
-    hits = query.source("pid").scan()
+    with query.point_in_time(keep_alive="1m") as pit:
+        hits = pit.source("pid").iterate(keep_alive="1m")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 142 - 163, The current use of
AgentViafSearch().params(...).sort(...).scan() can leave scroll contexts open
when you break early; replace scan() with the point_in_time() context manager
and use its iterate() method (keeping .source("pid") and the same params/sort)
so the PIT is automatically closed on exit; iterate() yields the same hits for
your progressbar loop and you can keep calling _refresh_viaf_record(pid=hit.pid,
...) and breaking when count >= batch_size without leaving a lingering scroll.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/cli.py`:
- Around line 202-313: The harvest_viaf command currently ignores --progress
unless --verbose is set and treats batch_size=0 inconsistently (rolling uses
batch_size or None but full refresh passes raw 0), so normalize batch_size at
the start (e.g., batch_arg = None if batch_size == 0 else batch_size) and pass
that to process_viaf_rolling_refresh and process_viaf_refresh (both delay and
sync calls) to make "0=all" consistent; also honor the progress flag by passing
a progress parameter into the synchronous refresh calls
(process_viaf_rolling_refresh and process_viaf_refresh) or by enabling
progressbar output even when verbose is False so --progress alone shows progress
feedback. Ensure you update calls in harvest_viaf and use the normalized
batch_arg and progress flag everywhere.

In `@rero_mef/agents/viaf/api.py`:
- Around line 426-433: The local variable returned from create_or_update is
assigned to action but never used; update the tuple assignment in the block that
calls create_or_update (around target_data.pop("redirect_from", None) and the
subsequent call) to rename action to a throwaway variable (e.g., _action or _)
so the linter stops flagging the unused symbol, leaving new_record and the
returned Action.REDIRECT and redirect_info unchanged.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 47-71: The tuple unpackings return unused values flagged by Ruff
RUF059; when calling existing_record.handle_redirect(...) only use the needed
variable(s) (keep action) and discard the others by renaming them to _new_record
and _redirect_info (or prefix with an underscore), and when calling
AgentViafRecord.create_or_update(...) if only action is used rename the unused
record to _record (or _), so replace new_record, redirect_info and record with
underscore-prefixed names to indicate they are intentionally unused while
keeping action as-is.

In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 48-80: In test_refresh_viaf_record_update remove the dead
"nonexistent" setup: delete the lines that set pid = "nonexistent" and the
subsequent mock_get.return_value = mock_response(json_data={"sources": {}})
since no call to _refresh_viaf_record or assertions follow; this cleans up
unused test scaffolding around agent_viaf_record and mock_get in
test_refresh_viaf_record_update.

In `@tests/ui/places/test_places_api.py`:
- Line 250: The test function test_places_utils_get_places_endpoints (and the
other test at line ~261 that also only needs the fixture side-effects) should
stop declaring the unused parameter `app` to silence Ruff ARG001; either remove
the `app` parameter and add `@pytest.mark.usefixtures`("app") above the function,
or keep the parameter but add a local per-line suppression (e.g., `# noqa:
ARG001`) immediately after the function signature—apply the same change to the
second test that currently only uses `app` for side-effects so Ruff no longer
reports ARG001.

In `@tests/ui/places/test_places_gnd_tasks.py`:
- Around line 24-25: The test functions (e.g., test_gnd_get_record_success)
declare an unused fixture parameter named app which triggers Ruff ARG001; rename
the parameter to _app (or _app: fixture-type) in each affected test
(test_gnd_get_record_success and the other tests on lines referenced) or append
a per-parameter noqa comment (# noqa: ARG001) to silence the linter; update the
function signatures for test_gnd_get_record_success, and the similar tests at
the other locations to use the underscore-prefixed name consistently so the
fixtures still run but Ruff no longer flags them.
- Around line 113-115: Replace the manual try/except/assert False pattern in the
test that calls gnd_get_record("040754766", debug=True) with pytest.raises to
reliably assert the ValueError; specifically, update the test in
tests/ui/places/test_places_gnd_tasks.py so the call to gnd_get_record is
wrapped in a pytest.raises(ValueError) context manager (referencing
gnd_get_record and the test that currently uses assert False) to avoid
assertions being stripped with python -O.

---

Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 142-163: The current use of
AgentViafSearch().params(...).sort(...).scan() can leave scroll contexts open
when you break early; replace scan() with the point_in_time() context manager
and use its iterate() method (keeping .source("pid") and the same params/sort)
so the PIT is automatically closed on exit; iterate() yields the same hits for
your progressbar loop and you can keep calling _refresh_viaf_record(pid=hit.pid,
...) and breaking when count >= batch_size without leaving a lingering scroll.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af4511b and d4cdb54.

📒 Files selected for processing (14)
  • .github/AI_SETUP.md
  • .github/copilot-instructions.md
  • .github/workflows/continuous-integration-test.yml
  • CLAUDE.md
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/config.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • rero_mef/config.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (9)
tests/ui/places/test_places_gnd_tasks.py (2)

23-24: Multiple unused app fixtures trigger Ruff ARG001.

All six test functions declare app solely for setup side-effects. Rename to _app or use @pytest.mark.usefixtures("app").

Also applies to: 47-49, 60-62, 74-76, 85-87, 108-110

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/places/test_places_gnd_tasks.py` around lines 23 - 24, Several tests
(starting with the test function test_gnd_get_record_success) declare the
fixture parameter app but never use it, causing Ruff ARG001; rename the unused
parameter to _app (e.g., change test_gnd_get_record_success(app) to
test_gnd_get_record_success(_app)) or alternatively apply
`@pytest.mark.usefixtures`("app") to the test functions to keep the side-effect
without a parameter; update the other affected test functions the same way (the
ones around lines referenced in the review) so no tests declare an unused app
parameter.

113-117: assert False is stripped with python -O — use pytest.raises instead.

🛠️ Proposed fix
-    try:
-        gnd_get_record("040754766", debug=True)
-        assert False, "Should have raised ValueError"
-    except ValueError as e:
-        assert str(e) == "Test error"
+    with pytest.raises(ValueError, match="Test error"):
+        gnd_get_record("040754766", debug=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/places/test_places_gnd_tasks.py` around lines 113 - 117, The test
currently uses an assert False pattern that is removed under python -O; replace
the try/except with pytest's context manager: import pytest if needed, then use
with pytest.raises(ValueError) as excinfo: call gnd_get_record("040754766",
debug=True) and afterwards assert str(excinfo.value) == "Test error". Update the
test in tests/ui/places/test_places_gnd_tasks.py and reference the
gnd_get_record call and the ValueError expectation when making the change.
tests/ui/places/test_places_api.py (1)

250-258: Unused app fixture still flagged by Ruff (ARG001).

Both test_places_utils_get_places_endpoints and test_places_utils_make_identifier declare app only for its fixture side-effects; Ruff flags these. Additionally, the "bf:Isbn:978-3-16-148410-0" assertion at line 277 is now correct — make_identifier in rero_mef/places/utils.py uses : as the separator for the no-source case (return f"{identified_by['type']}:{identified_by['value']}").

Also applies to: 261-277

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/places/test_places_api.py` around lines 250 - 258, Ruff flags the
`app` fixture as unused in tests test_places_utils_get_places_endpoints and
test_places_utils_make_identifier; to fix, remove the `app` parameter from the
function signatures and instead apply the fixture for side-effects either by
adding a pytest.mark.usefixtures("app") decorator above each test or by renaming
the parameter to `_app` to indicate intentional unusedness; update the tests
referencing get_places_endpoints and make_identifier in rero_mef.places.utils
accordingly so they still run with the fixture side-effects but no longer
trigger ARG001.
tests/ui/agents/test_agents_viaf_tasks.py (2)

34-34: Unused fixture parameters throughout — Ruff ARG001.

All test functions in this file that declare app and/or agent_viaf_record purely for fixture side-effects trigger Ruff ARG001. Rename unused parameters with a leading underscore or move them to @pytest.mark.usefixtures(...).

Also applies to: 50-50, 83-83, 107-107, 126-126, 145-145, 160-160, 181-181, 198-198, 216-216, 235-235, 252-252

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_tasks.py` at line 34, Several test functions
(e.g., test_refresh_viaf_record_create and other test_* functions in
tests/ui/agents/test_agents_viaf_tasks.py) declare fixtures like app and
agent_viaf_record only for side-effects and trigger Ruff ARG001; rename unused
parameters by prefixing them with an underscore (e.g., app -> _app,
agent_viaf_record -> _agent_viaf_record) or remove them from the signature and
add them to `@pytest.mark.usefixtures`(...) so the fixtures still run but are not
treated as unused arguments.

73-78: Dead code — setup block at lines 75–78 has no assertions and is never exercised.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_tasks.py` around lines 73 - 78, The test
contains dead setup: the local variable pid = "nonexistent" and the assignment
mock_get.return_value = mock_response(json_data={"sources": {}}) are never used
or asserted; either remove these unused lines or convert them into meaningful
assertions exercising the code path (e.g., call the function under test with pid
and assert the expected result or that mock_get was called). Locate and update
the test in tests/ui/agents/test_agents_viaf_tasks.py by removing the unused pid
and mock_get.return_value setup or by adding an invocation and assertions that
reference pid and mock_response so the mock is actually exercised.
rero_mef/agents/viaf/api.py (1)

427-427: Ruff RUF059 — unpacked action variable is never used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` at line 427, The tuple unpacking from
self.create_or_update assigns an unused variable action; change the unpack to
ignore the unused value by replacing "new_record, action =
self.create_or_update(" with "new_record, _ = self.create_or_update(" (or
alternatively only assign the first element via new_record =
self.create_or_update(...)[0]) so the unused variable warning is resolved while
keeping the call to create_or_update in place.
rero_mef/agents/viaf/tasks.py (2)

51-60: Ruff RUF059 — new_record, redirect_info, and record are unpacked but never used.

🛠️ Proposed fix
-            new_record, action, redirect_info = existing_record.handle_redirect(
+            _new_record, action, _redirect_info = existing_record.handle_redirect(
                 redirect_to_pid=online_data["pid"],
                 dbcommit=dbcommit,
                 reindex=reindex,
             )
...
-    record, action = AgentViafRecord.create_or_update(
+    _record, action = AgentViafRecord.create_or_update(
         data=online_data,
         dbcommit=dbcommit,
         reindex=reindex,
         test_md5=True,
     )

Also applies to: 64-71

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 51 - 60, The code unpacks
new_record and redirect_info (and elsewhere record) from
existing_record.handle_redirect(...) but never uses them; change the unpacking
to ignore unused values—e.g., use _, action, _ =
existing_record.handle_redirect(redirect_to_pid=online_data["pid"],
dbcommit=dbcommit, reindex=reindex) or fetch the action by index (action =
existing_record.handle_redirect(...)[1]) so only action is kept; apply the same
change to the other occurrence (the block around lines 64-71) to remove the
unused new_record/redirect_info/record variables.

143-163: scan() scroll context is not closed on early break — Elasticsearch resource leak.

query.source("pid").scan() opens a Scroll API context; breaking at line 155 before the iterator is exhausted leaves that context open until the TTL expires (typically 5 min). Under concurrent or frequent scheduling this accumulates open scroll contexts. Consider using the PIT context manager (point_in_time()) or calling .scan(clear_scroll=True) (note: clear_scroll=True only fires cleanup when the iterator finishes naturally, not on early break in all DSL versions).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 143 - 163, The scan() call on
AgentViafSearch opens an Elasticsearch scroll that is not closed when the loop
breaks early; wrap the iteration in a point-in-time context so the PIT/scroll is
always closed on early exit. Replace the direct query.source("pid").scan() usage
with AgentViafSearch().point_in_time() as pit: and iterate using the query bound
to that PIT (or use the search API with the PIT id), and ensure the context
manager is exited before returning from the function that calls
_refresh_viaf_record so the PIT/scroll is cleaned up even when breaking early.
rero_mef/agents/cli.py (1)

246-313: --progress flag is not propagated to process_viaf_rolling_refresh / process_viaf_refresh, and batch_size=0 has different semantics in rolling vs full mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 246 - 313, The CLI doesn’t forward the
--progress flag into process_viaf_rolling_refresh and process_viaf_refresh and
treats batch_size=0 inconsistently between rolling and full modes; update the
calls to process_viaf_rolling_refresh(...) and process_viaf_refresh(...) to
include verbose/progress (e.g. pass progress=progress or progress=(progress or
verbose) as the function expects) and normalize batch_size semantics (convert
batch_size==0 to None if 0 should mean “no limit” or vice versa) so both rolling
and full paths use the same interpretation of batch_size and receive the
progress flag.
🧹 Nitpick comments (1)
rero_mef/agents/cli.py (1)

284-289: Late import of _refresh_viaf_record inside the loop body.

from .viaf.tasks import _refresh_viaf_record is re-executed on every non-enqueue iteration. Python caches the import, so it is cheap, but the import statement still executes inside the hot path and is already available via the module. Hoist it to the top of the function (or module level) alongside the other task imports.

♻️ Proposed fix
 from .viaf.tasks import (
     process_viaf_refresh,
     process_viaf_rolling_refresh,
     viaf_get_record,
+    _refresh_viaf_record,
 )

Then remove the inline import at line 285.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 284 - 289, The inline import "from
.viaf.tasks import _refresh_viaf_record" inside the loop should be hoisted out
of the hot path: add the import at the top of the function (or module level)
with the other task imports so _refresh_viaf_record is resolved once, then
remove the inline import from the non-enqueue branch where action =
_refresh_viaf_record(pid=pid, dbcommit=True, reindex=True, verbose=verbose) is
called.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 391-433: handle_redirect currently deletes the old VIAF record
before creating the target, drops the real action returned by create_or_update
(always returns Action.REDIRECT), and never computes an md5 for the new record;
to fix: move the deletion (self.delete) to occur only after a successful
create_or_update, call add_md5(target_data) before calling create_or_update,
capture the returned action from create_or_update and propagate/return it (or
return Action.ERROR/VALIDATION_ERROR when create_or_update fails) instead of
unconditionally returning Action.REDIRECT, and ensure callers get the correct
redirect_info; refer to handle_redirect, create_or_update, add_md5, self.delete
and Action for the exact places to change.

In `@tests/ui/agents/test_agents_cli.py`:
- Line 88: The tests test_create_from_viaf and
test_create_from_viaf_with_enqueue declare the agent_viaf_record fixture only
for DB side-effects but don't use the parameter so Ruff flags ARG001; fix by
either renaming the parameter to _agent_viaf_record in the function signatures
of test_create_from_viaf and test_create_from_viaf_with_enqueue or remove the
parameter and annotate the functions with
`@pytest.mark.usefixtures`("agent_viaf_record") so the fixture still runs for
setup without an unused-argument error.

In `@tests/ui/agents/test_agents_viaf_api.py`:
- Line 347: The tests test_handle_redirect,
test_handle_redirect_target_not_found, test_get_online_with_redirect,
test_handle_redirect_chained, and test_get_online_other_source declare an unused
app fixture causing Ruff ARG001; fix by either renaming the parameter to _app in
each function signature (e.g., def test_handle_redirect(_app, mock_get,
agent_viaf_online_response):) or removing the parameter and annotating the test
with `@pytest.mark.usefixtures`("app") to keep the side-effect-only fixture active
while eliminating the unused-argument lint error.
- Line 461: The test unpacks a three-tuple from
agent_viaf_record.handle_redirect into new_record, action, redirect_info but
never uses redirect_info (Ruff RUF059); update the unpack to new_record, action,
_redirect_info (or simply _ ) in the test to mark the unused value and silence
the linter, keeping the call to agent_viaf_record.handle_redirect unchanged.

In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 247-248: The current assertion uses "or" which makes the
verbose-output check vacuous; change it to two independent assertions so both
are enforced: after calling capsys.readouterr() (captured), assert that action
== Action.CREATE and separately assert that the captured.out contains "VIAF
get:". Refer to captured, capsys.readouterr(), action, Action.CREATE and the
literal "VIAF get:" when locating the test lines to update.

---

Duplicate comments:
In `@rero_mef/agents/cli.py`:
- Around line 246-313: The CLI doesn’t forward the --progress flag into
process_viaf_rolling_refresh and process_viaf_refresh and treats batch_size=0
inconsistently between rolling and full modes; update the calls to
process_viaf_rolling_refresh(...) and process_viaf_refresh(...) to include
verbose/progress (e.g. pass progress=progress or progress=(progress or verbose)
as the function expects) and normalize batch_size semantics (convert
batch_size==0 to None if 0 should mean “no limit” or vice versa) so both rolling
and full paths use the same interpretation of batch_size and receive the
progress flag.

In `@rero_mef/agents/viaf/api.py`:
- Line 427: The tuple unpacking from self.create_or_update assigns an unused
variable action; change the unpack to ignore the unused value by replacing
"new_record, action = self.create_or_update(" with "new_record, _ =
self.create_or_update(" (or alternatively only assign the first element via
new_record = self.create_or_update(...)[0]) so the unused variable warning is
resolved while keeping the call to create_or_update in place.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 51-60: The code unpacks new_record and redirect_info (and
elsewhere record) from existing_record.handle_redirect(...) but never uses them;
change the unpacking to ignore unused values—e.g., use _, action, _ =
existing_record.handle_redirect(redirect_to_pid=online_data["pid"],
dbcommit=dbcommit, reindex=reindex) or fetch the action by index (action =
existing_record.handle_redirect(...)[1]) so only action is kept; apply the same
change to the other occurrence (the block around lines 64-71) to remove the
unused new_record/redirect_info/record variables.
- Around line 143-163: The scan() call on AgentViafSearch opens an Elasticsearch
scroll that is not closed when the loop breaks early; wrap the iteration in a
point-in-time context so the PIT/scroll is always closed on early exit. Replace
the direct query.source("pid").scan() usage with
AgentViafSearch().point_in_time() as pit: and iterate using the query bound to
that PIT (or use the search API with the PIT id), and ensure the context manager
is exited before returning from the function that calls _refresh_viaf_record so
the PIT/scroll is cleaned up even when breaking early.

In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Line 34: Several test functions (e.g., test_refresh_viaf_record_create and
other test_* functions in tests/ui/agents/test_agents_viaf_tasks.py) declare
fixtures like app and agent_viaf_record only for side-effects and trigger Ruff
ARG001; rename unused parameters by prefixing them with an underscore (e.g., app
-> _app, agent_viaf_record -> _agent_viaf_record) or remove them from the
signature and add them to `@pytest.mark.usefixtures`(...) so the fixtures still
run but are not treated as unused arguments.
- Around line 73-78: The test contains dead setup: the local variable pid =
"nonexistent" and the assignment mock_get.return_value =
mock_response(json_data={"sources": {}}) are never used or asserted; either
remove these unused lines or convert them into meaningful assertions exercising
the code path (e.g., call the function under test with pid and assert the
expected result or that mock_get was called). Locate and update the test in
tests/ui/agents/test_agents_viaf_tasks.py by removing the unused pid and
mock_get.return_value setup or by adding an invocation and assertions that
reference pid and mock_response so the mock is actually exercised.

In `@tests/ui/places/test_places_api.py`:
- Around line 250-258: Ruff flags the `app` fixture as unused in tests
test_places_utils_get_places_endpoints and test_places_utils_make_identifier; to
fix, remove the `app` parameter from the function signatures and instead apply
the fixture for side-effects either by adding a pytest.mark.usefixtures("app")
decorator above each test or by renaming the parameter to `_app` to indicate
intentional unusedness; update the tests referencing get_places_endpoints and
make_identifier in rero_mef.places.utils accordingly so they still run with the
fixture side-effects but no longer trigger ARG001.

In `@tests/ui/places/test_places_gnd_tasks.py`:
- Around line 23-24: Several tests (starting with the test function
test_gnd_get_record_success) declare the fixture parameter app but never use it,
causing Ruff ARG001; rename the unused parameter to _app (e.g., change
test_gnd_get_record_success(app) to test_gnd_get_record_success(_app)) or
alternatively apply `@pytest.mark.usefixtures`("app") to the test functions to
keep the side-effect without a parameter; update the other affected test
functions the same way (the ones around lines referenced in the review) so no
tests declare an unused app parameter.
- Around line 113-117: The test currently uses an assert False pattern that is
removed under python -O; replace the try/except with pytest's context manager:
import pytest if needed, then use with pytest.raises(ValueError) as excinfo:
call gnd_get_record("040754766", debug=True) and afterwards assert
str(excinfo.value) == "Test error". Update the test in
tests/ui/places/test_places_gnd_tasks.py and reference the gnd_get_record call
and the ValueError expectation when making the change.

---

Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 284-289: The inline import "from .viaf.tasks import
_refresh_viaf_record" inside the loop should be hoisted out of the hot path: add
the import at the top of the function (or module level) with the other task
imports so _refresh_viaf_record is resolved once, then remove the inline import
from the non-enqueue branch where action = _refresh_viaf_record(pid=pid,
dbcommit=True, reindex=True, verbose=verbose) is called.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4cdb54 and 8f396e4.

⛔ Files ignored due to path filters (5)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
📒 Files selected for processing (9)
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • rero_mef/api.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (4)
tests/ui/places/test_places_gnd_tasks.py (1)

26-26: Silence unused app fixtures to avoid ARG001 lint failures.

These parameters are unused and will trigger Ruff ARG001. Consider renaming to _app (or adding # noqa: ARG001) in each affected test.

🔧 Example fix (apply similarly to all listed tests)
-@mock.patch("rero_mef.places.gnd.tasks.requests_retry_session")
-def test_gnd_get_record_success(mock_session, app):
+@mock.patch("rero_mef.places.gnd.tasks.requests_retry_session")
+def test_gnd_get_record_success(mock_session, _app):

Also applies to: 50-50, 63-63, 77-77, 88-88, 111-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/places/test_places_gnd_tasks.py` at line 26, The tests declare an
unused fixture parameter named "app" (e.g., in function
test_gnd_get_record_success) which triggers Ruff ARG001; rename the parameter
from app to _app (or add a per-test "# noqa: ARG001") in
test_gnd_get_record_success and the other affected test functions in this file
so the linter recognizes them as intentionally unused and stops reporting
ARG001.
tests/ui/agents/test_agents_viaf_tasks.py (1)

229-244: Weak or assertion masks verbose-output verification.

assert "VIAF get:" in captured.out or action == Action.CREATE is trivially true when the record is created (which is always the case here). Assert both independently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_tasks.py` around lines 229 - 244, The test
test_refresh_viaf_record_verbose_output uses a weak combined assertion that
masks checking verbose output; update the test so it asserts the verbose message
and the action separately: call _refresh_viaf_record(pid=pid, dbcommit=True,
reindex=True, verbose=True) into action, capture output via capsys.readouterr(),
then assert "VIAF get:" is in captured.out and also assert action ==
Action.CREATE; refer to the test function name and the _refresh_viaf_record call
to locate where to change the assertions.
tests/ui/agents/test_agents_viaf_api.py (1)

462-462: Unused redirect_info unpacking — already flagged in prior review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` at line 462, The test unpacks three
values from agent_viaf_record.handle_redirect but never uses redirect_info;
update the unpacking in tests/ui/agents/test_agents_viaf_api.py to avoid the
unused variable by either discarding it (e.g., use _ or capture only the two
used values) or use the returned redirect_info in assertions if relevant; locate
the call to agent_viaf_record.handle_redirect and change the left-hand side to
match the actual usage (e.g., new_record, action =
agent_viaf_record.handle_redirect(...) or new_record, action, _ = ...).
tests/ui/agents/test_agents_cli.py (1)

88-88: Unused agent_viaf_record fixture parameter — use @pytest.mark.usefixtures.

The agent_viaf_record parameter is only needed for its DB-setup side effect. Use the decorator to avoid Ruff ARG001 warnings on both test_create_from_viaf (line 88) and test_create_from_viaf_with_enqueue (line 108).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_cli.py` at line 88, The tests
test_create_from_viaf and test_create_from_viaf_with_enqueue currently accept
the agent_viaf_record fixture only for its DB side-effects and trigger an
unused-argument warning; remove the agent_viaf_record parameter from both
function signatures and add `@pytest.mark.usefixtures`("agent_viaf_record") above
each test (keeping existing fixtures like script_info intact) so the fixture
runs for setup without being declared as an unused argument.
🧹 Nitpick comments (3)
rero_mef/agents/cli.py (1)

289-294: Move the lazy import outside the for loop.

The from .viaf.tasks import _refresh_viaf_record on line 290 is inside the loop body. Although Python's import caching prevents repeated I/O, placing it before the loop (or at function scope within the else block) is more conventional and readable.

Proposed fix
+        from .viaf.tasks import _refresh_viaf_record
+
         for pid in progress_bar:
             if batch_arg and count >= batch_arg:
                 break
             if enqueue:
                 task = viaf_get_record.delay(viaf_pid=pid, verbose=show_progress)
                 if verbose:
                     click.echo(f"  viaf pid: {pid} task: {task}")
             else:
-                from .viaf.tasks import _refresh_viaf_record
-
                 action = _refresh_viaf_record(
                     pid=pid, dbcommit=True, reindex=True, verbose=show_progress
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 289 - 294, The import statement "from
.viaf.tasks import _refresh_viaf_record" is inside a loop; move that import to
the containing scope (e.g., just before the for-loop or at the top of the
enclosing function/else block) so it runs once, then call
_refresh_viaf_record(pid=pid, dbcommit=True, reindex=True,
verbose=show_progress) inside the loop; keep the function name
_refresh_viaf_record to locate the code.
rero_mef/agents/viaf/api.py (1)

433-437: Use logging.exception instead of logging.error to capture the traceback.

Per Ruff TRY400, logging.exception automatically includes the exception traceback, which is more useful for debugging create/update failures during redirect handling.

Proposed fix
         except Exception as e:
-            current_app.logger.error(
+            current_app.logger.exception(
                 f"Failed to create/update target VIAF {redirect_to_pid}: {e}"
             )
             return None, Action.ERROR, redirect_info
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 433 - 437, The except block
catching Exception as e should use current_app.logger.exception (not
current_app.logger.error) so the traceback is included; update the except
handler around the create/update redirect logic (the except Exception as e that
logs "Failed to create/update target VIAF {redirect_to_pid}: {e}") to call
current_app.logger.exception with the same descriptive message and keep the
subsequent return None, Action.ERROR, redirect_info unchanged.
rero_mef/agents/viaf/tasks.py (1)

95-114: batch_size check is falsy-based — passing 0 and None both mean "all", but the docstring says None.

Line 105: if batch_size and count >= batch_size — both None and 0 are falsy, so both mean "process all." This works correctly because the CLI normalizes 0 to None, but if someone calls this function directly with batch_size=0, it silently processes all records. Consider documenting this in the docstring or using an explicit is not None check for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 95 - 114, The batch_size falsy
check currently uses "if batch_size and count >= batch_size" which treats 0 the
same as None; change this to an explicit None check so callers can pass 0
intentionally: replace that condition with "if batch_size is not None and count
>= batch_size" in the loop that iterates over pids (referenced variables:
batch_size, count, pids, progress) and update the docstring to state that None
means "no limit" (or alternatively document that 0 is treated as unlimited) so
behavior is explicit for callers of the function that calls
_refresh_viaf_record.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 426-442: The test assertion is incompatible with handle_redirect
which propagates the inner create_or_update action; update
test_refresh_viaf_record_redirect to assert that action is one of Action.CREATE
or Action.UPDATE (e.g., replace assert action == Action.REDIRECT with assert
action in (Action.CREATE, Action.UPDATE)) so it matches the behavior of
create_or_update returned by handle_redirect (see create_or_update and delete
usages in the method).

In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 346-384: test_handle_redirect asserts the returned action from
AgentViafRecord.handle_redirect is either Action.CREATE or Action.UPDATE, but
the task-layer test test_refresh_viaf_record_redirect expects Action.REDIRECT
for the same code path, so make the behavior consistent: update
AgentViafRecord.handle_redirect (and/or the wrapper create_or_update call) so it
returns Action.REDIRECT when the operation is the result of a VIAF cluster merge
redirect (the case exercised by test_handle_redirect), or alternatively change
test_handle_redirect to expect Action.REDIRECT to match
test_refresh_viaf_record_redirect; locate the logic in handle_redirect and the
create_or_update call (see viaf/api.py around the redirect handling block) and
ensure the function returns the canonical Action.REDIRECT for redirects instead
of passing through the raw create_or_update result.

In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 91-99: The test expects _refresh_viaf_record to return
Action.REDIRECT but handle_redirect currently returns the raw create_or_update
action (CREATE/UPDATE); modify handle_redirect in viaf/api.py so that on
successful redirect processing it returns Action.REDIRECT (instead of returning
the create_or_update result), while preserving error handling and logging and
ensuring callers like _refresh_viaf_record still receive the REDIRECT sentinel;
alternatively, if you prefer the current behavior, update the test assertion to
check for the create_or_update action returned by handle_redirect (CREATE or
UPDATE) rather than Action.REDIRECT.

In `@tests/ui/places/test_places_api.py`:
- Around line 250-258: The test test_places_utils_get_places_endpoints relies on
a hard-coded key "plgnd" which couples it to the app fixture's RERO_PLACES
config; update the test to explicitly assert the fixture provides the expected
key before checking membership (or skip the assertion if absent) by reading the
app/config or RERO_PLACES used by get_places_endpoints, e.g. assert the
fixture/config contains the expected endpoint key or call pytest.skip when it
does not, so the membership check on endpoints is only performed when the
fixture guarantees "plgnd" exists.
- Around line 274-277: The two make_identifier implementations diverge
(places/utils.make_identifier uses "type:value" while
rero_mef/utils.make_identifier at the symbol make_identifier uses "type|value"),
causing silent lookup mismatches; fix by making the implementation in
rero_mef/utils.py produce the same no-source format as
places/utils.make_identifier (use ':' between type and value when source is
absent), update any callers found by running the suggested rg search to ensure
they expect the unified format, and add a brief compatibility note or adapter in
lookup code (where identifiers are consumed) to accept both ":" and "|" during a
transitional period.

---

Duplicate comments:
In `@tests/ui/agents/test_agents_cli.py`:
- Line 88: The tests test_create_from_viaf and
test_create_from_viaf_with_enqueue currently accept the agent_viaf_record
fixture only for its DB side-effects and trigger an unused-argument warning;
remove the agent_viaf_record parameter from both function signatures and add
`@pytest.mark.usefixtures`("agent_viaf_record") above each test (keeping existing
fixtures like script_info intact) so the fixture runs for setup without being
declared as an unused argument.

In `@tests/ui/agents/test_agents_viaf_api.py`:
- Line 462: The test unpacks three values from agent_viaf_record.handle_redirect
but never uses redirect_info; update the unpacking in
tests/ui/agents/test_agents_viaf_api.py to avoid the unused variable by either
discarding it (e.g., use _ or capture only the two used values) or use the
returned redirect_info in assertions if relevant; locate the call to
agent_viaf_record.handle_redirect and change the left-hand side to match the
actual usage (e.g., new_record, action = agent_viaf_record.handle_redirect(...)
or new_record, action, _ = ...).

In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 229-244: The test test_refresh_viaf_record_verbose_output uses a
weak combined assertion that masks checking verbose output; update the test so
it asserts the verbose message and the action separately: call
_refresh_viaf_record(pid=pid, dbcommit=True, reindex=True, verbose=True) into
action, capture output via capsys.readouterr(), then assert "VIAF get:" is in
captured.out and also assert action == Action.CREATE; refer to the test function
name and the _refresh_viaf_record call to locate where to change the assertions.

In `@tests/ui/places/test_places_gnd_tasks.py`:
- Line 26: The tests declare an unused fixture parameter named "app" (e.g., in
function test_gnd_get_record_success) which triggers Ruff ARG001; rename the
parameter from app to _app (or add a per-test "# noqa: ARG001") in
test_gnd_get_record_success and the other affected test functions in this file
so the linter recognizes them as intentionally unused and stops reporting
ARG001.

---

Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 289-294: The import statement "from .viaf.tasks import
_refresh_viaf_record" is inside a loop; move that import to the containing scope
(e.g., just before the for-loop or at the top of the enclosing function/else
block) so it runs once, then call _refresh_viaf_record(pid=pid, dbcommit=True,
reindex=True, verbose=show_progress) inside the loop; keep the function name
_refresh_viaf_record to locate the code.

In `@rero_mef/agents/viaf/api.py`:
- Around line 433-437: The except block catching Exception as e should use
current_app.logger.exception (not current_app.logger.error) so the traceback is
included; update the except handler around the create/update redirect logic (the
except Exception as e that logs "Failed to create/update target VIAF
{redirect_to_pid}: {e}") to call current_app.logger.exception with the same
descriptive message and keep the subsequent return None, Action.ERROR,
redirect_info unchanged.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 95-114: The batch_size falsy check currently uses "if batch_size
and count >= batch_size" which treats 0 the same as None; change this to an
explicit None check so callers can pass 0 intentionally: replace that condition
with "if batch_size is not None and count >= batch_size" in the loop that
iterates over pids (referenced variables: batch_size, count, pids, progress) and
update the docstring to state that None means "no limit" (or alternatively
document that 0 is treated as unlimited) so behavior is explicit for callers of
the function that calls _refresh_viaf_record.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f396e4 and 1204e52.

⛔ Files ignored due to path filters (5)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
📒 Files selected for processing (9)
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
tests/ui/agents/test_agents_viaf_tasks.py (2)

243-244: Weak or assertion — action == Action.CREATE makes the verbose check vacuous.

Since the test always creates a new record, action == Action.CREATE is virtually guaranteed, making the entire assertion trivially True regardless of verbose output. Assert both independently.

🛠️ Proposed fix
-    assert "VIAF get:" in captured.out or action == Action.CREATE
+    assert action == Action.CREATE
+    assert "VIAF get:" in captured.out
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_tasks.py` around lines 243 - 244, The test
currently uses a weak assertion combining verbose output and action with an
`or`, making it vacuous; change it to assert both conditions separately: call
`captured = capsys.readouterr()` and then assert that ` "VIAF get:" in
captured.out` and separately assert `action == Action.CREATE` (i.e., replace the
single `assert "VIAF get:" in captured.out or action == Action.CREATE` with two
independent asserts referencing `captured.out` and `Action.CREATE`).

91-99: ⚠️ Potential issue | 🔴 Critical

assert action == Action.REDIRECT is inconsistent with handle_redirect return value.

handle_redirect (in viaf/api.py line 489) returns the action from create_or_update, which will be Action.CREATE or Action.UPDATE — not Action.REDIRECT. This test assertion will fail.

Either fix handle_redirect to return Action.REDIRECT on success, or update this assertion:

-    assert action == Action.REDIRECT
+    assert action in (Action.CREATE, Action.UPDATE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_tasks.py` around lines 91 - 99, The test
asserts that _refresh_viaf_record returns Action.REDIRECT but handle_redirect
(viaf.api.handle_redirect) currently returns whatever create_or_update returns
(Action.CREATE or Action.UPDATE), so update either the function or the test:
preferably change the test assertion to expect Action.CREATE or Action.UPDATE
(or check for not None/new record existence) instead of Action.REDIRECT;
alternatively, modify handle_redirect to explicitly return Action.REDIRECT on
successful redirect handling after calling create_or_update. Reference symbols:
_refresh_viaf_record (test), handle_redirect (viaf.api), create_or_update, and
Action enum.
tests/ui/agents/test_agents_viaf_api.py (3)

374-376: assert action in (Action.CREATE, Action.UPDATE) is inconsistent with the task-layer test that expects Action.REDIRECT for the same code path.

test_refresh_viaf_record_redirect in tests/ui/agents/test_agents_viaf_tasks.py (line ~95) asserts Action.REDIRECT for a redirect operation, while this test accepts the raw create_or_update result. One of the two tests must be wrong. The recommended fix is to have handle_redirect in rero_mef/agents/viaf/api.py return the canonical Action.REDIRECT sentinel rather than forwarding the inner create_or_update action, then update this assertion to assert action == Action.REDIRECT.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` around lines 374 - 376, The test is
asserting that create_or_update's raw result is returned from the redirect
handler, but the task-layer test expects the canonical Action.REDIRECT sentinel;
update handle_redirect in rero_mef/agents/viaf/api.py to return Action.REDIRECT
(not forward the inner create_or_update return value) when a redirect is
handled, and then change this test's assertion to assert action ==
Action.REDIRECT so both unit and task-layer tests align; reference the
handle_redirect function and create_or_update call to locate where to replace
the forwarded return with the Action.REDIRECT sentinel.

347-347: All five new test functions still carry an unused app fixture argument (ARG001).

app is used only for its side-effects (application context setup). Either prefix it with _ or apply @pytest.mark.usefixtures("app") to suppress the warning cleanly.

🔧 Proposed fix (applied to all five functions)
-def test_handle_redirect(mock_get, app, agent_viaf_online_response):
+@pytest.mark.usefixtures("app")
+def test_handle_redirect(mock_get, agent_viaf_online_response):

-def test_handle_redirect_target_not_found(mock_get, app):
+@pytest.mark.usefixtures("app")
+def test_handle_redirect_target_not_found(mock_get):

-def test_get_online_with_redirect(mock_get, app, agent_viaf_online_response):
+@pytest.mark.usefixtures("app")
+def test_get_online_with_redirect(mock_get, agent_viaf_online_response):

-def test_handle_redirect_chained(mock_get, app):
+@pytest.mark.usefixtures("app")
+def test_handle_redirect_chained(mock_get):

-def test_get_online_other_source(mock_get, app, agent_viaf_online_response):
+@pytest.mark.usefixtures("app")
+def test_get_online_other_source(mock_get, agent_viaf_online_response):

Also applies to: 388-388, 419-419, 439-439, 474-474

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` at line 347, The five new tests
(including test_handle_redirect and the other four test functions added around
the same area) take an unused app fixture which triggers ARG001; either rename
the parameter to _app to indicate intentional unused usage or remove the
parameter and add `@pytest.mark.usefixtures`("app") above each test so the app
fixture still runs for side-effects; update the signatures for
test_handle_redirect and the four sibling tests (the other test_* functions
added in the same diff) accordingly to suppress the unused-fixture warning.

461-466: Unpacked variable redirect_info is never referenced in the test body (RUF059).

🔧 Proposed fix
-    new_record, action, redirect_info = agent_viaf_record.handle_redirect(
+    new_record, action, _redirect_info = agent_viaf_record.handle_redirect(
         redirect_to_pid=new_pid,
         dbcommit=True,
         reindex=True,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` around lines 461 - 466, The test
unpacks three return values from agent_viaf_record.handle_redirect into
new_record, action, redirect_info but never uses redirect_info; update the
unpack to avoid the unused variable by either capturing it as a deliberately
ignored name (e.g. _redirect_info) or by only unpacking the two used values
(new_record, action) and discarding the third, ensuring you still call
agent_viaf_record.handle_redirect with the same args; reference symbols:
agent_viaf_record.handle_redirect, new_record, action, redirect_info.
🧹 Nitpick comments (5)
rero_mef/agents/cli.py (2)

326-337: Move the deferred import outside the loop.

The from .viaf.tasks import _refresh_viaf_record import on line 327 is inside the for pid in progress_bar loop. While Python caches modules after the first import, this is an unnecessary re-evaluation on each iteration. Move it before the loop.

♻️ Suggested fix
         if not enqueue:
+            from .viaf.tasks import _refresh_viaf_record
+
         for pid in progress_bar:
             if batch_arg and count >= batch_arg:
                 break
             if enqueue:
                 task = viaf_get_record.delay(
                     viaf_pid=pid,
                     verbose=verbose,
                     progress=progress,
                     delete_if_not_found=delete_if_not_found,
                     update_agents=update_agents,
                 )
                 if verbose:
                     click.echo(f"  viaf pid: {pid} task: {task}")
             else:
-                from .viaf.tasks import _refresh_viaf_record
-
                 action = _refresh_viaf_record(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 326 - 337, The import for
_refresh_viaf_record is inside the loop causing repeated import evaluation; move
"from .viaf.tasks import _refresh_viaf_record" to just before the "for pid in
progress_bar" loop so the module is imported once, then call
_refresh_viaf_record(...) inside the loop exactly as currently done; ensure the
imported name matches the existing calls and there are no local name shadowing
issues.

301-344: Missing mode: _refresh_viaf_record is called directly instead of through task layer.

In the missing mode with enqueue=False (lines 326-339), the code calls _refresh_viaf_record directly (a private function) rather than using the public task function like the other modes use process_viaf_refresh / process_viaf_rolling_refresh. This works but bypasses the set_timestamp tracking that the other task wrappers provide.

This is a design choice — the per-PID approach gives more granular control and progress reporting. Just noting the difference in behavior: no set_timestamp is called for missing mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 301 - 344, The missing-mode loop
currently calls the private helper _refresh_viaf_record(...) directly (bypassing
timestamp tracking); change it to call the public task wrapper used by other
modes (e.g., process_viaf_refresh or the synchronous equivalent of
viaf_get_record) so set_timestamp is executed consistently; replace the direct
call in the else branch with the public function (importing it if needed), pass
the same parameters (pid, dbcommit=True, reindex=True, verbose, progress,
delete_if_not_found, update_agents), and ensure the returned action value is
handled the same way in action_counts.
rero_mef/agents/viaf/api.py (2)

468-484: Same recommendation: use logger.exception for the redirect error path.

This catch block handles failures during create/update of the redirect target. Including the traceback via logger.exception will help diagnose issues like DB errors, schema validation failures, etc.

♻️ Suggested fix
         except Exception as e:
-            current_app.logger.error(
+            current_app.logger.exception(
                 f"Failed to create/update target VIAF {redirect_to_pid}: {e}"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 468 - 484, In the except block
around the call to self.create_or_update (handling redirect target creation),
replace current_app.logger.error(...) with current_app.logger.exception(...) so
the traceback is logged along with the message that includes redirect_to_pid;
keep the existing info-log and delete call (self.delete) and the return of
(None, Action.ERROR, redirect_info) intact—this change should be made in the
try/except that surrounds create_or_update in the VIAF agent API so errors from
create_or_update (and any underlying DB/schema issues) include full stack traces
for debugging.

385-387: Use logger.exception to preserve traceback in parse error logging.

When catching a broad Exception during external API response parsing (which is reasonable here given unpredictable API formats), logger.exception will include the full traceback, making it much easier to diagnose parsing failures in production.

♻️ Suggested fix
         except Exception as e:
-                current_app.logger.error(f"Error parsing VIAF response for {pid}: {e}")
+                current_app.logger.exception(f"Error parsing VIAF response for {pid}: {e}")
                 return {}, f"VIAF get: {pid:<15} {url} | PARSE ERROR: {e}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 385 - 387, Replace the current
generic error log call in the except block that catches parse errors with a
traceback-preserving call: change current_app.logger.error(f"Error parsing VIAF
response for {pid}: {e}") to current_app.logger.exception(...) so the full
traceback is recorded while keeping the same contextual message (including pid
and url) and returning the same tuple; update the except block in the VIAF
response parsing code where pid and url are referenced to use
current_app.logger.exception to preserve diagnostics.
rero_mef/agents/viaf/tasks.py (1)

100-151: batch_size=0 bypasses the early-termination check — document or normalize.

When batch_size=0 is passed (e.g., from CLI where 0 means "all"), the guard if batch_size and count >= batch_size (line 134) is False because 0 is falsy, so all records are processed. This works correctly but relies on Python truthiness of 0. The docstring says None=all, but 0 also means "all" in practice. Consider normalizing at the call site (the CLI already does batch_arg = None if batch_size == 0 else batch_size) or documenting this in the docstring.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 100 - 151, The function
process_viaf_refresh relies on Python truthiness for batch_size so passing 0
(falsy) behaves like None; normalize or make the check explicit by either
converting batch_size==0 to None at the start of process_viaf_refresh or change
the guard to an explicit None check (e.g., use "if batch_size is not None and
count >= batch_size") so batch_size=0 is handled deterministically; update the
docstring to state that 0 is treated as "all" if you choose normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 140-152: The test test_viaf_get_record_task asserts an exact call
to _refresh_viaf_record but only lists four kwargs; viaf_get_record forwards all
seven kwargs (including progress, delete_if_not_found, update_agents with their
defaults) so assert_called_once_with will fail — update the assertion in
test_viaf_get_record_task to include the full set of keyword arguments passed by
viaf_get_record (pid=pid, dbcommit=True, reindex=True, verbose=False,
progress=False, delete_if_not_found=False, update_agents=True) to match the
actual call to _refresh_viaf_record.

---

Duplicate comments:
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 374-376: The test is asserting that create_or_update's raw result
is returned from the redirect handler, but the task-layer test expects the
canonical Action.REDIRECT sentinel; update handle_redirect in
rero_mef/agents/viaf/api.py to return Action.REDIRECT (not forward the inner
create_or_update return value) when a redirect is handled, and then change this
test's assertion to assert action == Action.REDIRECT so both unit and task-layer
tests align; reference the handle_redirect function and create_or_update call to
locate where to replace the forwarded return with the Action.REDIRECT sentinel.
- Line 347: The five new tests (including test_handle_redirect and the other
four test functions added around the same area) take an unused app fixture which
triggers ARG001; either rename the parameter to _app to indicate intentional
unused usage or remove the parameter and add `@pytest.mark.usefixtures`("app")
above each test so the app fixture still runs for side-effects; update the
signatures for test_handle_redirect and the four sibling tests (the other test_*
functions added in the same diff) accordingly to suppress the unused-fixture
warning.
- Around line 461-466: The test unpacks three return values from
agent_viaf_record.handle_redirect into new_record, action, redirect_info but
never uses redirect_info; update the unpack to avoid the unused variable by
either capturing it as a deliberately ignored name (e.g. _redirect_info) or by
only unpacking the two used values (new_record, action) and discarding the
third, ensuring you still call agent_viaf_record.handle_redirect with the same
args; reference symbols: agent_viaf_record.handle_redirect, new_record, action,
redirect_info.

In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 243-244: The test currently uses a weak assertion combining
verbose output and action with an `or`, making it vacuous; change it to assert
both conditions separately: call `captured = capsys.readouterr()` and then
assert that ` "VIAF get:" in captured.out` and separately assert `action ==
Action.CREATE` (i.e., replace the single `assert "VIAF get:" in captured.out or
action == Action.CREATE` with two independent asserts referencing `captured.out`
and `Action.CREATE`).
- Around line 91-99: The test asserts that _refresh_viaf_record returns
Action.REDIRECT but handle_redirect (viaf.api.handle_redirect) currently returns
whatever create_or_update returns (Action.CREATE or Action.UPDATE), so update
either the function or the test: preferably change the test assertion to expect
Action.CREATE or Action.UPDATE (or check for not None/new record existence)
instead of Action.REDIRECT; alternatively, modify handle_redirect to explicitly
return Action.REDIRECT on successful redirect handling after calling
create_or_update. Reference symbols: _refresh_viaf_record (test),
handle_redirect (viaf.api), create_or_update, and Action enum.

---

Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 326-337: The import for _refresh_viaf_record is inside the loop
causing repeated import evaluation; move "from .viaf.tasks import
_refresh_viaf_record" to just before the "for pid in progress_bar" loop so the
module is imported once, then call _refresh_viaf_record(...) inside the loop
exactly as currently done; ensure the imported name matches the existing calls
and there are no local name shadowing issues.
- Around line 301-344: The missing-mode loop currently calls the private helper
_refresh_viaf_record(...) directly (bypassing timestamp tracking); change it to
call the public task wrapper used by other modes (e.g., process_viaf_refresh or
the synchronous equivalent of viaf_get_record) so set_timestamp is executed
consistently; replace the direct call in the else branch with the public
function (importing it if needed), pass the same parameters (pid, dbcommit=True,
reindex=True, verbose, progress, delete_if_not_found, update_agents), and ensure
the returned action value is handled the same way in action_counts.

In `@rero_mef/agents/viaf/api.py`:
- Around line 468-484: In the except block around the call to
self.create_or_update (handling redirect target creation), replace
current_app.logger.error(...) with current_app.logger.exception(...) so the
traceback is logged along with the message that includes redirect_to_pid; keep
the existing info-log and delete call (self.delete) and the return of (None,
Action.ERROR, redirect_info) intact—this change should be made in the try/except
that surrounds create_or_update in the VIAF agent API so errors from
create_or_update (and any underlying DB/schema issues) include full stack traces
for debugging.
- Around line 385-387: Replace the current generic error log call in the except
block that catches parse errors with a traceback-preserving call: change
current_app.logger.error(f"Error parsing VIAF response for {pid}: {e}") to
current_app.logger.exception(...) so the full traceback is recorded while
keeping the same contextual message (including pid and url) and returning the
same tuple; update the except block in the VIAF response parsing code where pid
and url are referenced to use current_app.logger.exception to preserve
diagnostics.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 100-151: The function process_viaf_refresh relies on Python
truthiness for batch_size so passing 0 (falsy) behaves like None; normalize or
make the check explicit by either converting batch_size==0 to None at the start
of process_viaf_refresh or change the guard to an explicit None check (e.g., use
"if batch_size is not None and count >= batch_size") so batch_size=0 is handled
deterministically; update the docstring to state that 0 is treated as "all" if
you choose normalization.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1204e52 and cba4052.

⛔ Files ignored due to path filters (5)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
📒 Files selected for processing (9)
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py

@rerowep rerowep force-pushed the wep-viaf-updates branch 3 times, most recently from c4ecc0f to 40d6ca1 Compare February 25, 2026 09:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
tests/ui/places/test_places_gnd_tasks.py (1)

26-26: ⚠️ Potential issue | 🟡 Minor

Rename unused app fixture args to _app in all tests.

Lines 26, 50, 63, 77, 88, and 111 declare app but never use it, triggering ARG001. Keep fixture setup behavior but silence lint by renaming to _app.

Suggested patch
 `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session")
-def test_gnd_get_record_success(mock_session, app):
+def test_gnd_get_record_success(mock_session, _app):
@@
 `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session")
-def test_gnd_get_record_http_error(mock_session, app):
+def test_gnd_get_record_http_error(mock_session, _app):
@@
 `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session")
-def test_gnd_get_record_no_records(mock_session, app):
+def test_gnd_get_record_no_records(mock_session, _app):
@@
 `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session")
-def test_gnd_get_record_exception(mock_session, app):
+def test_gnd_get_record_exception(mock_session, _app):
@@
 `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session")
-def test_gnd_get_record_pid_mismatch(mock_session, app):
+def test_gnd_get_record_pid_mismatch(mock_session, _app):
@@
 `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session")
-def test_gnd_get_record_debug_reraises(mock_session, app):
+def test_gnd_get_record_debug_reraises(mock_session, _app):

Also applies to: 50-50, 63-63, 77-77, 88-88, 111-111

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/places/test_places_gnd_tasks.py` at line 26, Several test functions
(e.g., test_gnd_get_record_success) declare the fixture parameter named app but
never use it, triggering ARG001; rename that parameter to _app in each test that
accepts the unused fixture to silence the lint warning while keeping fixture
behavior (search for functions in this file with a parameter named app and
change the parameter to _app, e.g., def
test_gnd_get_record_success(mock_session, app) -> def
test_gnd_get_record_success(mock_session, _app)).
tests/ui/places/test_places_api.py (1)

258-258: Hard-coded "plgnd" key — existing open concern.

The coupling between this assertion and the test app fixture's RERO_PLACES config was already flagged in a prior review. Flagging again to avoid losing track.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/places/test_places_api.py` at line 258, Replace the hard-coded
"plgnd" assertion in tests/ui/places/test_places_api.py with a dynamic check
against the app/test fixture configuration (use the RERO_PLACES value or derive
the expected key from the same fixture that builds endpoints) so the test
asserts that the configured place key exists rather than the literal "plgnd";
update the assertion that currently reads assert "plgnd" in endpoints to
reference the configuration (e.g., app.config['RERO_PLACES'] or the
fixture-provided expected_key) so the test no longer depends on a magic string.
tests/ui/agents/test_agents_cli.py (1)

88-88: ⚠️ Potential issue | 🟡 Minor

Unused side-effect fixture arguments still trigger ARG001.

agent_viaf_record is used only for setup side-effects in both tests and never referenced directly. Rename to _agent_viaf_record (or use @pytest.mark.usefixtures) to clear lint warnings.

🧹 Proposed fix
-def test_create_from_viaf(script_info, agent_viaf_record):
+def test_create_from_viaf(script_info, _agent_viaf_record):
...
-def test_create_from_viaf_with_enqueue(script_info, agent_viaf_record):
+def test_create_from_viaf_with_enqueue(script_info, _agent_viaf_record):

Also applies to: 108-108

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_cli.py` at line 88, The test function
test_create_from_viaf accepts the fixture agent_viaf_record only for
side-effects, which triggers ARG001; either rename the parameter to
_agent_viaf_record or remove the parameter and add
`@pytest.mark.usefixtures`("agent_viaf_record") above test_create_from_viaf to
silence the lint warning; apply the same change for the other test that also
takes agent_viaf_record at the second occurrence.
rero_mef/agents/cli.py (1)

275-283: ⚠️ Potential issue | 🟠 Major

--batch_size 0 is still inconsistent in rolling mode.

Line 276 maps 0 to None, but process_viaf_rolling_refresh treats None as “use configured default batch size”, not “all”. So rolling mode silently processes only a default subset while help text says 0=all.

🔧 Proposed fix
-    # Normalize batch_size: 0 means "all records"
-    batch_arg = None if batch_size == 0 else batch_size
+    # Normalize batch_size:
+    # - rolling + 0 => process all VIAF records
+    # - other modes + 0 => no limit (None)
+    if rolling and batch_size == 0:
+        batch_arg = AgentViafRecord.count()
+    else:
+        batch_arg = None if batch_size == 0 else batch_size

Also applies to: 290-291

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 275 - 283, The CLI currently maps
batch_size==0 to None (batch_arg) which makes process_viaf_rolling_refresh
interpret it as "use default" instead of "all"; change the call sites so rolling
mode passes an explicit 0 (or a dedicated sentinel meaning "all") to
process_viaf_rolling_refresh.delay and the non-enqueue direct call instead of
None—e.g., compute a rolling_batch_arg = 0 if batch_size == 0 else batch_size
and use that in the process_viaf_rolling_refresh.delay and the direct
process_viaf_rolling_refresh(...) invocation so rolling mode truly means
"0=all".
🧹 Nitpick comments (1)
rero_mef/agents/cli.py (1)

203-210: Guard batch_size at the CLI boundary.

type=int allows negative values, which leads to confusing early-termination behavior in refresh loops. Prefer non-negative validation in Click option parsing.

🛡️ Proposed fix
 `@click.option`(
     "-b",
     "--batch_size",
     "batch_size",
     default=0,
-    type=int,
+    type=click.IntRange(min=0),
     help="Limit number of records to process (0=all).",
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 203 - 210, The CLI option for batch_size
currently uses type=int which permits negatives; update the Click option
declaration (the click.option for "-b/--batch_size" in rero_mef.agents.cli) to
enforce non-negative values by using click.IntRange(min=0) as the type or by
adding a callback that raises click.BadParameter for batch_size < 0; ensure the
option name "batch_size" and any code reading it (e.g., the refresh loop) keep
the same identifier and behavior for zero meaning "all".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 76-93: AgentViafRecord.create_or_update currently triggers
record.create_mef_and_agents internally, so remove the external duplicate block
that calls record.create_mef_and_agents when update_agents is True; instead
extend AgentViafRecord.create_or_update to accept update_agents and verbose
(alongside existing params like dbcommit, reindex, test_md5) and gate the
side-effect there (call record.create_mef_and_agents only when update_agents is
True and honor verbose), then update callers to pass update_agents/verbose into
AgentViafRecord.create_or_update and delete the downstream conditional that
re-invokes record.create_mef_and_agents.

In `@tests/ui/agents/test_agents_cli.py`:
- Around line 132-157: The test test_create_from_viaf_with_viaf_file fails
because the CLI is invoked without the test application fixture; update the test
function signature to include the app fixture (change def
test_create_from_viaf_with_viaf_file(script_info, tmpdir): to def
test_create_from_viaf_with_viaf_file(app, script_info, tmpdir):) so
create_from_viaf runs with the test app context; keep the rest of the test
(mocking task_create_mef_and_agents_from_viaf and using CliRunner) unchanged.

---

Duplicate comments:
In `@rero_mef/agents/cli.py`:
- Around line 275-283: The CLI currently maps batch_size==0 to None (batch_arg)
which makes process_viaf_rolling_refresh interpret it as "use default" instead
of "all"; change the call sites so rolling mode passes an explicit 0 (or a
dedicated sentinel meaning "all") to process_viaf_rolling_refresh.delay and the
non-enqueue direct call instead of None—e.g., compute a rolling_batch_arg = 0 if
batch_size == 0 else batch_size and use that in the
process_viaf_rolling_refresh.delay and the direct
process_viaf_rolling_refresh(...) invocation so rolling mode truly means
"0=all".

In `@tests/ui/agents/test_agents_cli.py`:
- Line 88: The test function test_create_from_viaf accepts the fixture
agent_viaf_record only for side-effects, which triggers ARG001; either rename
the parameter to _agent_viaf_record or remove the parameter and add
`@pytest.mark.usefixtures`("agent_viaf_record") above test_create_from_viaf to
silence the lint warning; apply the same change for the other test that also
takes agent_viaf_record at the second occurrence.

In `@tests/ui/places/test_places_api.py`:
- Line 258: Replace the hard-coded "plgnd" assertion in
tests/ui/places/test_places_api.py with a dynamic check against the app/test
fixture configuration (use the RERO_PLACES value or derive the expected key from
the same fixture that builds endpoints) so the test asserts that the configured
place key exists rather than the literal "plgnd"; update the assertion that
currently reads assert "plgnd" in endpoints to reference the configuration
(e.g., app.config['RERO_PLACES'] or the fixture-provided expected_key) so the
test no longer depends on a magic string.

In `@tests/ui/places/test_places_gnd_tasks.py`:
- Line 26: Several test functions (e.g., test_gnd_get_record_success) declare
the fixture parameter named app but never use it, triggering ARG001; rename that
parameter to _app in each test that accepts the unused fixture to silence the
lint warning while keeping fixture behavior (search for functions in this file
with a parameter named app and change the parameter to _app, e.g., def
test_gnd_get_record_success(mock_session, app) -> def
test_gnd_get_record_success(mock_session, _app)).

---

Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 203-210: The CLI option for batch_size currently uses type=int
which permits negatives; update the Click option declaration (the click.option
for "-b/--batch_size" in rero_mef.agents.cli) to enforce non-negative values by
using click.IntRange(min=0) as the type or by adding a callback that raises
click.BadParameter for batch_size < 0; ensure the option name "batch_size" and
any code reading it (e.g., the refresh loop) keep the same identifier and
behavior for zero meaning "all".

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cba4052 and 40d6ca1.

⛔ Files ignored due to path filters (6)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (9)
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
rero_mef/agents/viaf/api.py (1)

471-477: ⚠️ Potential issue | 🟡 Minor

Rename the unused action binding.

handle_redirect() always returns Action.REDIRECT from this branch, so the local action from create_or_update() is dead and Ruff will flag it. Rename it to _action or _.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 471 - 477, The local variable
"action" assigned from create_or_update(...) inside handle_redirect() is unused
because this branch always returns Action.REDIRECT; rename the binding to a
throwaway name (e.g., _action or _) to silence the linter. Update the tuple
assignment in the handle_redirect call site where create_or_update(...) is
invoked so it captures (new_record, _action) instead of (new_record, action) and
leave the rest of the logic unchanged.
tests/ui/agents/test_agents_viaf_api.py (1)

461-465: ⚠️ Potential issue | 🟡 Minor

Mark redirect_info as intentionally unused.

This test never asserts the third return value, so Ruff will still report RUF059 here. Rename it to _redirect_info or _.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` around lines 461 - 465, The third
return value from agent_viaf_record.handle_redirect is unused and triggers
RUF059; rename redirect_info to a deliberately unused variable (e.g.,
_redirect_info or _) in the assignment new_record, action, _redirect_info =
agent_viaf_record.handle_redirect(...) to silence the linter while keeping the
call to agent_viaf_record.handle_redirect intact.
rero_mef/agents/viaf/tasks.py (1)

56-93: ⚠️ Potential issue | 🟠 Major

update_agents does not actually gate downstream updates.

AgentViafRecord.create_or_update() already calls record.create_mef_and_agents() in rero_mef/agents/viaf/api.py, and the redirect branch reaches the same side effect through existing_record.handle_redirect(). So update_agents=False still rebuilds MEF/agent records, while the extra block here makes non-redirect CREATE/UPDATE paths do the work twice. That defeats the new flag and also wipes out most of the MD5-based savings on no-op refreshes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 56 - 93, The downstream MEF/agent
rebuild is being run twice because AgentViafRecord.create_or_update (in
rero_mef/agents/viaf/api.py) already calls record.create_mef_and_agents and this
tasks.py also calls record.create_mef_and_agents when update_agents is True; to
fix, add an update_agents boolean param to AgentViafRecord.create_or_update and
thread it through so create_or_update(data=..., dbcommit=..., reindex=...,
test_md5=True, update_agents=update_agents) performs the MEF/agent work when
requested, then remove the post-create_or_update block in
rero_mef/agents/viaf/tasks.py that calls record.create_mef_and_agents (the
redirect path can keep using existing_record.handle_redirect which already
handles it). Ensure all call sites of create_or_update are updated to pass the
new flag and adjust unit tests accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 125-144: The loop iterates over a PID sequence produced by
AgentViafRecord.get_all_pids() which uses limit/offset and can be invalidated
when _refresh_viaf_record() deletes/recreates PIDs; to fix, materialize a stable
snapshot of PIDs before the loop (e.g., replace the streaming/DB-paged iterator
with a concrete list from AgentViafRecord.get_all_pids()) or refactor this path
to use the PIT-style iteration used by process_viaf_rolling_refresh so offsets
can't shift under the loop; update the progressbar input accordingly to iterate
over the snapshot or the PIT iterator to ensure no records are skipped.

---

Duplicate comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 471-477: The local variable "action" assigned from
create_or_update(...) inside handle_redirect() is unused because this branch
always returns Action.REDIRECT; rename the binding to a throwaway name (e.g.,
_action or _) to silence the linter. Update the tuple assignment in the
handle_redirect call site where create_or_update(...) is invoked so it captures
(new_record, _action) instead of (new_record, action) and leave the rest of the
logic unchanged.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 56-93: The downstream MEF/agent rebuild is being run twice because
AgentViafRecord.create_or_update (in rero_mef/agents/viaf/api.py) already calls
record.create_mef_and_agents and this tasks.py also calls
record.create_mef_and_agents when update_agents is True; to fix, add an
update_agents boolean param to AgentViafRecord.create_or_update and thread it
through so create_or_update(data=..., dbcommit=..., reindex=..., test_md5=True,
update_agents=update_agents) performs the MEF/agent work when requested, then
remove the post-create_or_update block in rero_mef/agents/viaf/tasks.py that
calls record.create_mef_and_agents (the redirect path can keep using
existing_record.handle_redirect which already handles it). Ensure all call sites
of create_or_update are updated to pass the new flag and adjust unit tests
accordingly.

In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 461-465: The third return value from
agent_viaf_record.handle_redirect is unused and triggers RUF059; rename
redirect_info to a deliberately unused variable (e.g., _redirect_info or _) in
the assignment new_record, action, _redirect_info =
agent_viaf_record.handle_redirect(...) to silence the linter while keeping the
call to agent_viaf_record.handle_redirect intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 137e0b9f-27ad-4cef-ae02-0b1d6c46e402

📥 Commits

Reviewing files that changed from the base of the PR and between cba4052 and 3d57629.

⛔ Files ignored due to path filters (6)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (9)
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
✅ Files skipped from review due to trivial changes (3)
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/ui/agents/test_agents_viaf_tasks.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rero_mef/agents/viaf/api.py (1)

321-407: ⚠️ Potential issue | 🟠 Major

Distinguish transient VIAF API failures from genuine record misses.

get_online_record() returns {} for three different scenarios: non-200 HTTP responses, JSON parse failures, and records where the PID doesn't match the request. All callers check only if not online_data:, so _refresh_viaf_record() incorrectly returns Action.NOT_FOUND for transient network/parse errors. This breaks retry semantics, makes action stats misleading, and prevents proper error handling in bulk refresh jobs.

Return a typed outcome (or add a status field to result) so transient failures can be distinguished from genuine misses. The Action.ERROR enum value exists but currently cannot be used. Add test cases stubbing 5xx and malformed-JSON responses to lock down error path behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 321 - 407, The get_online_record()
function currently conflates transient HTTP/parse failures with genuine "not
found" results; modify get_online_record() to return a typed outcome by adding a
status field (e.g. result["status"] = "ok" | "not_found" | "error") or an
explicit enum (use existing Action.ERROR) alongside the data so callers can
distinguish transient failures; specifically, inside the HTTP error branch
(non-200) and the exception parse block in get_online_record() set
result["status"]="error" (and include an error message key), on successful parse
set result["status"]="ok", and when a PID mismatch occurs set "not_found";
update callers such as _refresh_viaf_record() to check result["status"] (or the
returned enum) and only map "not_found" to Action.NOT_FOUND while mapping
"error" to Action.ERROR; add tests that stub 5xx responses and malformed JSON to
assert the new error status is returned and that _refresh_viaf_record() yields
Action.ERROR for transient failures.
♻️ Duplicate comments (1)
rero_mef/agents/viaf/tasks.py (1)

27-34: ⚠️ Potential issue | 🟠 Major

update_agents is currently ineffective.

Both the normal path here and the redirect path ultimately land in AgentViafRecord.create_or_update(), which still unconditionally runs record.create_mef_and_agents(...) in rero_mef/agents/viaf/api.py. So --update-agents does not actually disable downstream MEF/agent work, and even UPTODATE refreshes still pay for that reconciliation cost. Please gate that side effect in create_or_update() and thread the flag through here.

As per coding guidelines, focus on performance problems and edge cases.

Also applies to: 76-81

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 34, The update_agents flag
is never honored because both code paths call
AgentViafRecord.create_or_update(), which still always invokes
record.create_mef_and_agents(...); modify _refresh_viaf_record to pass the
update_agents parameter into create_or_update (thread the flag through both the
normal and redirect branches) and then change AgentViafRecord.create_or_update
to accept and respect that flag by gating the call to
record.create_mef_and_agents(...) (only call create_mef_and_agents when
update_agents is True); ensure the flag is propagated through any intermediate
helpers so UPTODATE refreshes skip the expensive MEF/agent reconciliation when
update_agents is False.
🧹 Nitpick comments (2)
rero_mef/places/idref/api.py (1)

72-76: Use prefix-only slicing instead of global replacement for improved semantic correctness.

On line 72, replace("(DE-101)", "") removes every occurrence of the token, but only the leading prefix should be stripped. While current data doesn't contain duplicate tokens (verified in production datasets), the .replace() approach is semantically imprecise and fragile to future data changes. Using [len(prefix):] to slice off only the leading prefix is clearer in intent and more robust.

Proposed diff
         pids = list(
+            prefix = "(DE-101)"
-            dict.fromkeys(
+            dict.fromkeys(
-                identified_by.get("value", "").replace("(DE-101)", "")
+                identified_by.get("value", "")[len(prefix) :]
                 for identified_by in self.get("identifiedBy", [])
                 if identified_by.get("source") == "GND"
                 and identified_by.get("type") == "bf:Nbn"
-                and identified_by.get("value", "").startswith("(DE-101)")
+                and identified_by.get("value", "").startswith(prefix)
             )
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/places/idref/api.py` around lines 72 - 76, The current list
comprehension removes every occurrence of the token "(DE-101)" via
identified_by.get("value", "").replace("(DE-101)", ""), which is imprecise;
change it to slice off only the leading prefix when the value starts with that
token (e.g., compute prefix="(DE-101)" and use identified_by.get("value",
"")[len(prefix):] for the matching branch). Keep the existing filter
(identified_by.get("source") == "GND" and identified_by.get("type") == "bf:Nbn"
and identified_by.get("value", "").startswith("(DE-101)")) and apply the
prefix-only slicing in place of .replace() to produce the trimmed value.
tests/ui/agents/test_agents_viaf_api.py (1)

405-414: Assert that failed redirects keep the source record intact.

These failure cases only assert Action.ERROR, but the critical invariant here is that old_pid must still resolve when delete_if_not_found=False. Adding that assertion would catch the data-loss regression this path is most prone to.

💡 Example assertion
     assert action == Action.ERROR
     assert redirect_info == {"from": old_pid, "to": new_pid}
     assert new_record is None
+    assert AgentViafRecord.get_record_by_pid(old_pid) is not None

Apply the same check to the chained-redirect and exception paths.

As per coding guidelines, focus on logic errors and architectural improvements.

Also applies to: 461-469, 508-514

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` around lines 405 - 414, Add
assertions that the source record remains resolvable after failed redirects:
after calling agent_viaf_record.handle_redirect(redirect_to_pid=new_pid,
dbcommit=True, reindex=True) and asserting action == Action.ERROR, also call the
resolver (or use agent_viaf_record.resolve/get_by_pid helper used elsewhere) to
ensure old_pid still returns the original record (i.e., not deleted) — mimicking
delete_if_not_found=False semantics; apply the same extra assertion to the
chained-redirect and exception-path tests referenced (the blocks around lines
461-469 and 508-514) to prevent regressions that delete the source on failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/monitoring/api.py`:
- Around line 152-156: get_es_count() can return a diagnostic string (e.g. "No
... in ES") but the diff currently replaces db-es with an empty string which
hides the failure; change the assignment that sets info[doc_type]["db-es"] so
that when count_es is not an int you set a machine-readable failure value (for
example None or a sentinel like -1) instead of "", keep info[doc_type]["es"] as
the original count_es string, and update Monitoring.check() to treat
non-int/None/-1 db-es values as a failure state so missing ES indices no longer
appear healthy; locate symbols get_es_count, info[doc_type]["es"],
info[doc_type]["db-es"], and Monitoring.check to apply these changes.

In `@rero_mef/monitoring/views.py`:
- Around line 59-60: The exception handlers currently return JSON with the error
string but still send HTTP 200; update the handlers (the except Exception as
error blocks in monitoring.views) to return a 5xx HTTP response (e.g., 500)
along with the error JSON so callers/monitoring see a failure; change both
occurrences (the handler around the DB query at the first except and the second
occurrence at lines referenced) to return the JSON and an explicit 500 status
(or use make_response to set status) instead of letting Flask default to 200.

---

Outside diff comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 321-407: The get_online_record() function currently conflates
transient HTTP/parse failures with genuine "not found" results; modify
get_online_record() to return a typed outcome by adding a status field (e.g.
result["status"] = "ok" | "not_found" | "error") or an explicit enum (use
existing Action.ERROR) alongside the data so callers can distinguish transient
failures; specifically, inside the HTTP error branch (non-200) and the exception
parse block in get_online_record() set result["status"]="error" (and include an
error message key), on successful parse set result["status"]="ok", and when a
PID mismatch occurs set "not_found"; update callers such as
_refresh_viaf_record() to check result["status"] (or the returned enum) and only
map "not_found" to Action.NOT_FOUND while mapping "error" to Action.ERROR; add
tests that stub 5xx responses and malformed JSON to assert the new error status
is returned and that _refresh_viaf_record() yields Action.ERROR for transient
failures.

---

Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 27-34: The update_agents flag is never honored because both code
paths call AgentViafRecord.create_or_update(), which still always invokes
record.create_mef_and_agents(...); modify _refresh_viaf_record to pass the
update_agents parameter into create_or_update (thread the flag through both the
normal and redirect branches) and then change AgentViafRecord.create_or_update
to accept and respect that flag by gating the call to
record.create_mef_and_agents(...) (only call create_mef_and_agents when
update_agents is True); ensure the flag is propagated through any intermediate
helpers so UPTODATE refreshes skip the expensive MEF/agent reconciliation when
update_agents is False.

---

Nitpick comments:
In `@rero_mef/places/idref/api.py`:
- Around line 72-76: The current list comprehension removes every occurrence of
the token "(DE-101)" via identified_by.get("value", "").replace("(DE-101)", ""),
which is imprecise; change it to slice off only the leading prefix when the
value starts with that token (e.g., compute prefix="(DE-101)" and use
identified_by.get("value", "")[len(prefix):] for the matching branch). Keep the
existing filter (identified_by.get("source") == "GND" and
identified_by.get("type") == "bf:Nbn" and identified_by.get("value",
"").startswith("(DE-101)")) and apply the prefix-only slicing in place of
.replace() to produce the trimmed value.

In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 405-414: Add assertions that the source record remains resolvable
after failed redirects: after calling
agent_viaf_record.handle_redirect(redirect_to_pid=new_pid, dbcommit=True,
reindex=True) and asserting action == Action.ERROR, also call the resolver (or
use agent_viaf_record.resolve/get_by_pid helper used elsewhere) to ensure
old_pid still returns the original record (i.e., not deleted) — mimicking
delete_if_not_found=False semantics; apply the same extra assertion to the
chained-redirect and exception-path tests referenced (the blocks around lines
461-469 and 508-514) to prevent regressions that delete the source on failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 01657951-c99e-41cd-aa87-d69b44c2764c

📥 Commits

Reviewing files that changed from the base of the PR and between 3d57629 and a241a58.

⛔ Files ignored due to path filters (7)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • docker-services.yml is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (13)
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
✅ Files skipped from review due to trivial changes (2)
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_tasks.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
rero_mef/agents/viaf/tasks.py (1)

32-33: ⚠️ Potential issue | 🟠 Major

update_agents is currently a no-op.

This call goes through AgentViafRecord.create_or_update(), whose override in rero_mef/agents/viaf/api.py unconditionally calls record.create_mef_and_agents(...). So the refresh still mutates linked MEF/GND/IdRef records even when update_agents=False, and every UPTODATE VIAF record pays that extra work too. Please gate that side effect in the override, or bypass the override from this task when the flag is false.

Also applies to: 76-81

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 32 - 33, The update_agents flag
is effectively ignored because AgentViafRecord.create_or_update (in the override
in api.py) unconditionally calls record.create_mef_and_agents, causing
unnecessary mutations; change the override in AgentViafRecord.create_or_update
to check the update_agents argument before invoking create_mef_and_agents (i.e.,
only call record.create_mef_and_agents when update_agents is True), or
alternatively modify the task code that calls AgentViafRecord.create_or_update
to bypass the overridden method and call a no-side-effect path when
update_agents is False; reference the symbols AgentViafRecord.create_or_update
and create_mef_and_agents when making the conditional gating so UPTODATE records
skip the heavy side-effect when update_agents is False.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/cli.py`:
- Around line 203-210: The --batch_size click option (parameter name
"batch_size") currently allows negative values; change its validator so negative
inputs are rejected — e.g., replace type=int with type=click.IntRange(min=0) or
add a callback that raises click.BadParameter for values < 0, and apply the same
fix to the other batch_size option instance elsewhere in the file so the CLI
enforces the documented "0=all" semantics.

In `@rero_mef/agents/viaf/api.py`:
- Around line 321-326: The unprotected call to requests_retry_session().get(url,
headers=headers) in the VIAF fetch flow should be wrapped in a try/except that
catches requests.RequestException (and related connection/read exceptions) so
network-level failures don't raise out and abort the PID refresh; inside the
except set an appropriate error state on the existing result dict (e.g.,
result['status']='error' or result['error']=str(exc')), update msg to include
the exception details and the pid, and return or continue with that result so
downstream parsing/handlers (within the function that uses pid, response,
result, msg) can record the failure instead of crashing. Ensure you reference
and preserve the existing variables pid, url, headers, result and msg when
implementing the try/except around the get call.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 117-120: Remove the unsupported preserve_order=True usage on
AgentViafSearch().params(...) when used with .point_in_time().iterate(): delete
the preserve_order param and add an explicit .sort({"_updated": {"order":
"asc"}}) to the query that builds the PIT iterator (the code invoking
AgentViafSearch().params(...).source("pid").point_in_time().iterate()). Do the
same for the other occurrence around the .sort({"_updated": {"order": "asc"}})
region (replace any preserve_order=True there with the explicit .sort call) so
both PIT-based queries have deterministic ordering.

In `@rero_mef/monitoring/api.py`:
- Around line 187-189: The code in check() currently writes db_es as None into
checks, conflating ES-unavailable states with numeric deltas; change the logic
in the function check (the block that currently tests db_es) to only set
checks[info]["db_es"] when db_es is an integer/number (exclude None and
non-numeric values) and instead set a new key checks[info]["es_error"] =
"unavailable" (or the actual error message) when db_es is None or indicates an
ES connectivity/missing-index issue; then update the monitoring view (the
mapping in rero_mef/monitoring/views.py that formats db_es) to look for es_error
and translate it to the dedicated error code/message rather than rendering
"There are None items...".

In `@rero_mef/places/idref/api.py`:
- Around line 70-78: The comprehension that builds pids uses replace("(DE-101)",
"") which removes all occurrences and can yield empty IDs; change it to remove
only the leading prefix and skip empty results: when iterating over
self.get("identifiedBy", []) (the code building pids), detect values that
startwith("(DE-101)"), strip the prefix using str.removeprefix("(DE-101)") or
value[len("(DE-101)"):], then only include the resulting pid if it is non-empty;
keep the dict.fromkeys(...) dedupe logic but filter out any empty strings before
constructing pids.

---

Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 32-33: The update_agents flag is effectively ignored because
AgentViafRecord.create_or_update (in the override in api.py) unconditionally
calls record.create_mef_and_agents, causing unnecessary mutations; change the
override in AgentViafRecord.create_or_update to check the update_agents argument
before invoking create_mef_and_agents (i.e., only call
record.create_mef_and_agents when update_agents is True), or alternatively
modify the task code that calls AgentViafRecord.create_or_update to bypass the
overridden method and call a no-side-effect path when update_agents is False;
reference the symbols AgentViafRecord.create_or_update and create_mef_and_agents
when making the conditional gating so UPTODATE records skip the heavy
side-effect when update_agents is False.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 78aeda97-de80-4299-8467-2e111219e398

📥 Commits

Reviewing files that changed from the base of the PR and between a241a58 and 09a4cc1.

⛔ Files ignored due to path filters (7)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • docker-services.yml is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (13)
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
✅ Files skipped from review due to trivial changes (2)
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • rero_mef/monitoring/views.py
  • rero_mef/monitoring/cli.py
  • tests/ui/agents/test_agents_viaf_tasks.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
rero_mef/agents/viaf/tasks.py (2)

117-120: ⚠️ Potential issue | 🟠 Major

Remove preserve_order=True from the PIT queries.

The Elasticsearch DSL docs say Search.params() forwards search-request parameters, while preserve_order is defined on the scan helper and iterate() is the PIT-based alternative. Keeping preserve_order=True on these point_in_time().iterate() paths is therefore an API mismatch on both refresh queries. (elasticsearch-py.readthedocs.io)

🔧 Suggested fix
-    query = AgentViafSearch().params(preserve_order=True).source("pid")
+    query = AgentViafSearch().source("pid")
@@
-        AgentViafSearch()
-        .params(preserve_order=True)
+        AgentViafSearch()
         .sort({"_updated": {"order": "asc"}})
         .source("pid")

Also applies to: 176-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 117 - 120, The code is passing
preserve_order=True via AgentViafSearch().params(...) while using
point_in_time().iterate(), which is an API mismatch; remove the
preserve_order=True parameter from the Search.params(...) calls that precede
point_in_time().iterate() (e.g. the
AgentViafSearch().params(preserve_order=True).source("pid") usage around the
point_in_time().iterate() blocks and the similar occurrence at the later block
around lines 176-184) so the PIT-based iterate() is used without the scan-only
preserve_order parameter.

27-34: ⚠️ Potential issue | 🟠 Major

update_agents is currently a no-op in this task layer.

_refresh_viaf_record() accepts the flag and every public task forwards it, but the value is never read or passed into either handle_redirect() or create_or_update(). From this module, update_agents=True and False behave identically, so the task API/docstrings are misleading.

Either thread the flag through the lower-level update paths or remove it from these task signatures until it is actually supported.

Also applies to: 60-65, 76-81

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 34, The update_agents flag
accepted by _refresh_viaf_record (and the other task functions in this module
that forward it) is a no-op: it is never read or passed to lower-level functions
like handle_redirect() or create_or_update(), so update_agents True/False behave
identically. Fix by either (A) threading the flag through: add an update_agents
parameter to handle_redirect() and create_or_update(), pass the update_agents
value from _refresh_viaf_record into those calls and update any internal logic
to act on it; or (B) remove update_agents from the task function signatures and
all callers in this module (and update any docstrings/tests) so the API is
accurate. Ensure you update all occurrences of update_agents in this module
(including the other task wrappers that currently accept and forward it) to keep
the signatures and behavior consistent.
🧹 Nitpick comments (3)
rero_mef/agents/viaf/api.py (1)

478-478: Suppress unused action variable from create_or_update.

The static analysis correctly flags that action is unpacked but never used. Rename it to _action to indicate it's intentionally discarded.

🔧 Suggested fix
-            new_record, action = self.create_or_update(
+            new_record, _action = self.create_or_update(
                 data=target_data,
                 dbcommit=dbcommit,
                 reindex=reindex,
                 test_md5=True,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` at line 478, The tuple unpacking from
create_or_update currently assigns an unused variable named action; change the
unpack target to _action to signal it's intentionally discarded (e.g., replace
"new_record, action = self.create_or_update(...)" with "new_record, _action =
self.create_or_update(...)") so static analysis no longer flags the unused
variable while leaving create_or_update and new_record usage unchanged.
tests/ui/agents/test_agents_viaf_api.py (1)

461-461: Suppress unused redirect_info variable.

The unpacked redirect_info is not used in this test. Rename it to _redirect_info to indicate it's intentionally discarded.

🔧 Suggested fix
-    new_record, action, redirect_info = agent_viaf_record.handle_redirect(
+    new_record, action, _redirect_info = agent_viaf_record.handle_redirect(
         redirect_to_pid=new_pid,
         dbcommit=True,
         reindex=True,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` at line 461, The test unpacks a
third value from agent_viaf_record.handle_redirect into redirect_info but never
uses it; update the unpacking to assign that value to a throwaway name (e.g.,
_redirect_info) so linters understand it's intentionally discarded, by changing
the tuple assignment where agent_viaf_record.handle_redirect is called.
rero_mef/cli.py (1)

1280-1288: Consider adding a guard for get_entity_class returning None.

While pid_type values come from get_multiple_missing_pids which should return valid types, adding a defensive check would prevent potential AttributeError if get_entity_class returns None.

🛡️ Suggested defensive check
             entity_cls = get_entity_class(pid_type)
+            if entity_cls is None:
+                if verbose:
+                    click.secho(f"Unknown entity type: {pid_type}", fg="yellow")
+                continue
             for entity_pid in entity_pids:
                 record = entity_cls.get_record_by_pid(entity_pid)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/cli.py` around lines 1280 - 1288, Add a defensive guard after
calling get_entity_class(pid_type) to handle the case it returns None: check if
entity_cls is None, optionally log a clear warning (using click.secho when
verbose) that the pid_type is unknown, and continue to the next
pid_type/entity_pid instead of calling get_record_by_pid on None; this uses the
symbols get_entity_class, entity_cls, pid_type, get_record_by_pid, and relates
to pids sourced from get_multiple_missing_pids.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 115-126: The current logic treats batch_size=0 as "no limit"
because it uses truthiness checks; update the code that computes length and that
checks the loop limit to use explicit None checks instead (e.g., compute length
= min(batch_size, total) if batch_size is not None else total, and replace "if
batch_size and count >= batch_size" with "if batch_size is not None and count >=
batch_size") so that a caller passing 0 will result in zero items processed;
adjust references in the AgentViafRecord.count and AgentViafSearch() iteration
logic accordingly.

In `@rero_mef/cli.py`:
- Around line 1302-1306: The loop incorrectly invokes the instance method delete
on the class (mef_cls.delete(record, ...)); change the call to invoke the
instance method on the retrieved record object (record.delete(...)) and pass the
same keyword args (force=True, dbcommit=True, delindex=True); ensure you locate
the loop that iterates orphan_pids and uses mef_cls.get_record_by_pid(...) so
you update the delete call there and adjust any related counters
(deleted_orphan_count and orphan_details[group_name]["deleted"]) accordingly.

---

Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 117-120: The code is passing preserve_order=True via
AgentViafSearch().params(...) while using point_in_time().iterate(), which is an
API mismatch; remove the preserve_order=True parameter from the
Search.params(...) calls that precede point_in_time().iterate() (e.g. the
AgentViafSearch().params(preserve_order=True).source("pid") usage around the
point_in_time().iterate() blocks and the similar occurrence at the later block
around lines 176-184) so the PIT-based iterate() is used without the scan-only
preserve_order parameter.
- Around line 27-34: The update_agents flag accepted by _refresh_viaf_record
(and the other task functions in this module that forward it) is a no-op: it is
never read or passed to lower-level functions like handle_redirect() or
create_or_update(), so update_agents True/False behave identically. Fix by
either (A) threading the flag through: add an update_agents parameter to
handle_redirect() and create_or_update(), pass the update_agents value from
_refresh_viaf_record into those calls and update any internal logic to act on
it; or (B) remove update_agents from the task function signatures and all
callers in this module (and update any docstrings/tests) so the API is accurate.
Ensure you update all occurrences of update_agents in this module (including the
other task wrappers that currently accept and forward it) to keep the signatures
and behavior consistent.

---

Nitpick comments:
In `@rero_mef/agents/viaf/api.py`:
- Line 478: The tuple unpacking from create_or_update currently assigns an
unused variable named action; change the unpack target to _action to signal it's
intentionally discarded (e.g., replace "new_record, action =
self.create_or_update(...)" with "new_record, _action =
self.create_or_update(...)") so static analysis no longer flags the unused
variable while leaving create_or_update and new_record usage unchanged.

In `@rero_mef/cli.py`:
- Around line 1280-1288: Add a defensive guard after calling
get_entity_class(pid_type) to handle the case it returns None: check if
entity_cls is None, optionally log a clear warning (using click.secho when
verbose) that the pid_type is unknown, and continue to the next
pid_type/entity_pid instead of calling get_record_by_pid on None; this uses the
symbols get_entity_class, entity_cls, pid_type, get_record_by_pid, and relates
to pids sourced from get_multiple_missing_pids.

In `@tests/ui/agents/test_agents_viaf_api.py`:
- Line 461: The test unpacks a third value from
agent_viaf_record.handle_redirect into redirect_info but never uses it; update
the unpacking to assign that value to a throwaway name (e.g., _redirect_info) so
linters understand it's intentionally discarded, by changing the tuple
assignment where agent_viaf_record.handle_redirect is called.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 34484d78-7cac-42f4-98a3-47836ea18a86

📥 Commits

Reviewing files that changed from the base of the PR and between 09a4cc1 and 788790b.

⛔ Files ignored due to path filters (12)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • data/cidref.json is excluded by none and included by none
  • data/mef.json is excluded by none and included by none
  • data/mef_id.csv is excluded by !**/*.csv and included by none
  • data/mef_metadata.csv is excluded by !**/*.csv and included by none
  • data/mef_pidstore.csv is excluded by !**/*.csv and included by none
  • docker-services.yml is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (16)
  • rero_mef/agents/api.py
  • rero_mef/agents/cli.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/cli.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • tests/ui/test_cli.py
✅ Files skipped from review due to trivial changes (1)
  • tests/ui/agents/test_agents_cli.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • rero_mef/places/idref/api.py
  • tests/ui/places/test_places_api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/api.py

@rerowep rerowep force-pushed the wep-viaf-updates branch 2 times, most recently from 1d48299 to b5937cd Compare March 24, 2026 17:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
rero_mef/agents/viaf/tasks.py (2)

179-190: ⚠️ Potential issue | 🟠 Major

Rolling refresh is not guaranteed to be oldest-first with scan().

This query adds a sort on _updated, but Search.scan() does not preserve that ordering by default. That means the rolling job walks scan order rather than oldest-first order, so the intended ~6-month refresh cycle is not reliable. Prefer an ordered PIT/search-after iteration here.

In elasticsearch-dsl, does Search.scan() honor an explicit .sort() without preserve_order=True, and what is the recommended PIT/search_after pattern for guaranteed oldest-first iteration?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 179 - 190, The current loop uses
AgentViafSearch().sort({"_updated": {"order": "asc"}}).source("pid") and then
iterates query.scan(), but Search.scan() does not guarantee the sort order
unless preserve_order is used or you switch to a PIT + search_after pattern; as
a result the rolling refresh won't be strictly oldest-first. Fix by replacing
the scan-based iteration with a point-in-time (PIT) + paginated search_after
loop (open PIT, run sorted searches with size=batch_size and search_after
tokens, close PIT) or, if you must keep scan(), call scan(preserve_order=True)
to enforce the sort; update the loop that consumes progressbar(items=...) to use
the new iterator and ensure AgentViafSearch and progressbar still receive
.source("pid") and the sort on "_updated".

27-33: ⚠️ Potential issue | 🟠 Major

update_agents is still a no-op.

This task accepts the flag, but neither the redirect path nor the normal create/update path consults it before touching MEF/agent data. Both routes still end up in AgentViafRecord.create_or_update(), which unconditionally calls create_mef_and_agents(), so refreshes do the extra reconciliation even when update_agents=False.

Also applies to: 57-81

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 33, The task
_refresh_viaf_record currently ignores the update_agents flag because it always
calls AgentViafRecord.create_or_update(), which unconditionally invokes
create_mef_and_agents(); update the flow so the flag is respected by either (a)
adding an explicit parameter to
AgentViafRecord.create_or_update(update_agents=False) and having
create_or_update skip create_mef_and_agents() when False, or (b) keeping
create_or_update unchanged but branching in _refresh_viaf_record so that when
update_agents is False you do not call the path that triggers
create_mef_and_agents() (e.g., bypass the redirect/create/update route that
reconciles MEF/agents); ensure both the redirect-handling branch and the normal
create/update branch consult update_agents before touching MEF/agent data.
🧹 Nitpick comments (1)
tests/ui/agents/test_agents_viaf_api.py (1)

390-399: Redundant reassignment of old_pid.

Line 399 reassigns old_pid = agent_viaf_record.pid, but old_pid was already set to "12345678" on line 390 and the record was created with that value. This line is unnecessary.

     agent_viaf_record = AgentViafRecord.create(
         data=viaf_data, dbcommit=True, reindex=True
     )
-    old_pid = agent_viaf_record.pid
     new_pid = "999999999"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` around lines 390 - 399, The variable
old_pid is redundantly reassigned after creating the record; remove the
unnecessary reassignment (old_pid = agent_viaf_record.pid) so the test uses the
originally set old_pid value and avoid duplicate assignment—update the test
around AgentViafRecord.create and agent_viaf_record.pid to simply rely on the
initial old_pid assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/api.py`:
- Around line 275-280: The current AgentMefSearch block uses .filter("terms",
sources=entity_names) which excludes stored MEF docs because MEF documents don't
have a sources array; update get_unlinked_agents to stop filtering on the
non-existent sources field and instead detect documents by their source-specific
keys (e.g. check existence of idref, gnd, rero) or build a boolean should that
matches the entity_names to the appropriate source field(s); specifically,
change the AgentMefSearch query to remove the terms on sources and add a boolean
should/exists clause for idref, gnd, rero (or otherwise map entity_names to the
correct document key) so AgentMefSearch returns the intended unlinked MEF
records.

In `@rero_mef/agents/cli.py`:
- Around line 96-99: The CLI currently ignores the --enqueue flag when
--unlinked is set in the create_from_viaf command: locate the validation and
command handler around the unlinked/missing/viaf_file logic (variables unlinked,
enqueue and the create_from_viaf command function) and either 1) explicitly
reject the combination by extending the UsageError to include enqueue
("--unlinked cannot be combined with --missing, --viaf-file, or --enqueue"), or
2) implement enqueue semantics for unlinked mode by branching on enqueue inside
the create_from_viaf handler so that when unlinked and enqueue are True you
enqueue jobs instead of running the synchronous lookup loop; update only that
validation/branching logic so behavior is consistent and documented in the error
message or code path.

In `@rero_mef/agents/viaf/api.py`:
- Around line 324-333: The try/except in the VIAF GET handler currently returns
a truthy payload {"status": "error", ...} which causes callers like
_refresh_viaf_record to treat request failures as valid records and proceed into
add_md5() and create_or_update(); change the except block to return an empty
payload (e.g. {}) together with the same msg so callers that check for falsy
records will correctly skip processing (keep the msg formatting with pid and url
and still return result as {} instead of the {"status":"error"} dict); reference
the requests_retry_session GET call, the except handling, and callers
_refresh_viaf_record, add_md5, create_or_update when making the change.

In `@rero_mef/cli.py`:
- Around line 41-46: The import of rero_mef.cli_logging
(ensure_single_stream_handler) is causing ModuleNotFoundError; either add the
missing module exporting ensure_single_stream_handler or make the import
optional: in rero_mef.cli replace the direct "from .cli_logging import
ensure_single_stream_handler" with a guarded import that catches ImportError and
provides a lightweight fallback/no-op ensure_single_stream_handler, or remove
the import and its usages where present (e.g., in any CLI setup functions) so
importing rero_mef.cli or rero_mef.agents.cli no longer fails at import time.
- Around line 1267-1272: The loop that populates details iterates all pid types
returned by get_multiple_missing_pids() even when their entity_map is empty,
causing details.setdefault(...) to create zero-count entries; update the loop in
rero_mef/cli.py (the for pid_type, entity_map in multiple_pids.items() block) to
skip empty buckets by adding a guard like "if not entity_map: continue" before
computing entity_pids/count and updating duplicates_count and info, so details
only gets populated for pid types that actually have duplicates.

In `@tests/ui/agents/test_agents_cli.py`:
- Around line 165-168: The test currently patches
rero_mef.agents.viaf.tasks._refresh_viaf_record, but harvest_viaf has already
imported that function into rero_mef.agents.cli, so the test must patch the
CLI's bound reference instead; update the mock.patch target to
"rero_mef.agents.cli._refresh_viaf_record" (or patch the harvest_viaf symbol
directly) so the call made by harvest_viaf uses the mock rather than the real
_refresh_viaf_record.

---

Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 179-190: The current loop uses AgentViafSearch().sort({"_updated":
{"order": "asc"}}).source("pid") and then iterates query.scan(), but
Search.scan() does not guarantee the sort order unless preserve_order is used or
you switch to a PIT + search_after pattern; as a result the rolling refresh
won't be strictly oldest-first. Fix by replacing the scan-based iteration with a
point-in-time (PIT) + paginated search_after loop (open PIT, run sorted searches
with size=batch_size and search_after tokens, close PIT) or, if you must keep
scan(), call scan(preserve_order=True) to enforce the sort; update the loop that
consumes progressbar(items=...) to use the new iterator and ensure
AgentViafSearch and progressbar still receive .source("pid") and the sort on
"_updated".
- Around line 27-33: The task _refresh_viaf_record currently ignores the
update_agents flag because it always calls AgentViafRecord.create_or_update(),
which unconditionally invokes create_mef_and_agents(); update the flow so the
flag is respected by either (a) adding an explicit parameter to
AgentViafRecord.create_or_update(update_agents=False) and having
create_or_update skip create_mef_and_agents() when False, or (b) keeping
create_or_update unchanged but branching in _refresh_viaf_record so that when
update_agents is False you do not call the path that triggers
create_mef_and_agents() (e.g., bypass the redirect/create/update route that
reconciles MEF/agents); ensure both the redirect-handling branch and the normal
create/update branch consult update_agents before touching MEF/agent data.

---

Nitpick comments:
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 390-399: The variable old_pid is redundantly reassigned after
creating the record; remove the unnecessary reassignment (old_pid =
agent_viaf_record.pid) so the test uses the originally set old_pid value and
avoid duplicate assignment—update the test around AgentViafRecord.create and
agent_viaf_record.pid to simply rely on the initial old_pid assignment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e21e1aa3-4dc7-4fda-9a4d-cc433008d250

📥 Commits

Reviewing files that changed from the base of the PR and between 788790b and 1d48299.

⛔ Files ignored due to path filters (12)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • data/cidref.json is excluded by none and included by none
  • data/mef.json is excluded by none and included by none
  • data/mef_id.csv is excluded by !**/*.csv and included by none
  • data/mef_metadata.csv is excluded by !**/*.csv and included by none
  • data/mef_pidstore.csv is excluded by !**/*.csv and included by none
  • docker-services.yml is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (18)
  • rero_mef/agents/api.py
  • rero_mef/agents/cli.py
  • rero_mef/agents/mef/api.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/api_mef.py
  • rero_mef/cli.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • tests/ui/test_cli.py
💤 Files with no reviewable changes (1)
  • rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (3)
  • tests/ui/places/test_places_api.py
  • rero_mef/api.py
  • tests/ui/test_cli.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • rero_mef/monitoring/api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • rero_mef/monitoring/cli.py
  • rero_mef/places/idref/api.py

@rerowep rerowep force-pushed the wep-viaf-updates branch 6 times, most recently from 2f5aa41 to f2d45c2 Compare March 25, 2026 19:53
@rero rero deleted a comment from coderabbitai bot Mar 25, 2026
@rerowep rerowep force-pushed the wep-viaf-updates branch 2 times, most recently from 2cf0b58 to 37a79e3 Compare March 25, 2026 21:50
@coderabbitai
Copy link

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Adds VIAF harvest/refresh workflows (Celery tasks + CLI), robust VIAF HTTP handling (timeouts, retries, 429/backoff, redirects and redirect reconciliation), CLI logging helpers, MARC 686 → DDC extraction, identifier format tweak, numerous tests, and repository AI-assistant guideline docs.

Changes

Cohort / File(s) Summary
AI Assistant Guidelines
\.github/AI_SETUP.md, .github/copilot-instructions.md, CLAUDE.md
Add repository AI assistant configuration and comprehensive coding guidelines for Claude and Copilot.
VIAF HTTP, Parsing & Redirects
rero_mef/agents/viaf/api.py
Refactor online VIAF fetch: unified endpoint usage, percent-encoding for non-VIAF sources, configurable timeouts/retries/wallclock SIGALRM, 429 handling with Retry-After/backoff, hardened response parsing, RetryableVIAFError, and redirect detection + handle_redirect.
VIAF Tasks & Workers
rero_mef/agents/viaf/tasks.py
New Celery shared tasks and helper _refresh_viaf_record; batch and rolling refresh flows, redirect handling, action aggregation, single-PID entrypoint.
Agents CLI & API surface
rero_mef/agents/cli.py, rero_mef/agents/api.py, rero_mef/agents/mef/api.py
Add harvest_viaf CLI; modify create_from_viaf flags/validation and unlinked flow; add _clean_non_existing_viaf_links, get_unlinked_agents, get_all_missing_viaf_pids; remove legacy missing-pids method.
CLI Logging & Helpers
rero_mef/cli_logging.py, rero_mef/cli.py
Add ensure_single_stream_handler() and dedupe logic; integrate into CLI groups; RabbitMQ-aware queue inspection and combined queue-wait logic; new clean_multiple_mef command.
Core enums & config
rero_mef/api.py, rero_mef/config.py
Add Action.REDIRECT enum member; add RERO_MEF_VIAF_REFRESH_BATCH_SIZE and RERO_MEF_VIAF_REFRESH_DAYS_SPAN constants.
Monitoring & Views
rero_mef/monitoring/api.py, rero_mef/monitoring/cli.py, rero_mef/monitoring/views.py
Fix db-es null handling and time-window alignment; explicitly mark ES unavailable; improve error returns and raise Click exceptions on DB query failures; small CLI help/variable renames.
Identifier & places utils
rero_mef/places/utils.py, rero_mef/places/idref/api.py
Change no-source identifier delimiter from : to `
MARC → DDC classification
rero_mef/marctojson/do_gnd_concepts.py, .../do_gnd_places.py
Add trans_gnd_classification to extract MARC 686 → bf:ClassificationDdc entries and write when present.
Logging formatting standardization
rero_mef/marctojson/... (many files)
Convert many verbose logger calls to parameterized printf-style formatting.
CSV / utils behavior
rero_mef/utils.py
Stricter VIAF CSV generation: require pid (don't overwrite with viaf_pid) and raise on missing pid.
CI / workflow
.github/workflows/continuous-integration-test.yml
Add pull_request trigger for staging branch.
Tests
tests/ui/..., tests/unit/...
Extensive new and updated tests covering VIAF API (redirects, 429 retry), VIAF tasks, CLI harvest/cleanup, RabbitMQ queue handling, places utils, and GND classification transformations.
Minor removals / CLI cleanup
rero_mef/agents/mef/api.py, rero_mef/api_mef.py
Remove deprecated missing-pids helper and several CLI echo TODO messages.

Sequence Diagram(s)

sequenceDiagram
    actor CLI as CLI
    participant Tasks as VIAF Tasks
    participant VIAF as VIAF Online
    participant AgentsAPI as Agents API
    participant DB as Database
    participant ES as Elasticsearch

    CLI->>Tasks: process_viaf_refresh(batch_size, enqueue?)
    loop per PID
        Tasks->>VIAF: GET /viaf (pid or source|id) with headers/timeouts
        alt 200 OK
            VIAF-->>Tasks: viaf_data
            Tasks->>AgentsAPI: create_or_update_viaf(viaf_data)
            AgentsAPI->>DB: persist/update
            DB->>ES: reindex
        else 429 Rate Limit
            VIAF-->>Tasks: 429 + Retry-After?
            Tasks->>Tasks: _sleep_with_countdown()
            Tasks->>VIAF: retry GET
        else Redirect (cluster merge)
            VIAF-->>Tasks: REDIRECT -> new_pid
            Tasks->>AgentsAPI: handle_redirect(new_pid)
            AgentsAPI->>VIAF: GET new_pid
            AgentsAPI->>DB: create/update new record
            AgentsAPI->>DB: delete old record
            DB->>ES: reindex
        end
    end
    Tasks-->>CLI: (count, action_counts)
Loading
sequenceDiagram
    actor User as User CLI
    participant AgentsCLI as Agents CLI
    participant AgentsAPI as Agents API
    participant VIAF as VIAF Online
    participant MEF as MEF DB

    User->>AgentsCLI: create_from_viaf --unlinked
    AgentsCLI->>AgentsAPI: get_unlinked_agents(relink=False)
    AgentsAPI->>MEF: scan MEF for records without viaf_pid
    MEF-->>AgentsAPI: (mef_pid, source, entity_pid)
    loop per unlinked
        AgentsAPI->>VIAF: get_online_record(source, entity_pid)
        alt Found
            VIAF-->>AgentsAPI: viaf_data with viaf_pid
            AgentsAPI->>MEF: create_or_update_viaf_link(mef_pid, viaf_pid)
        else Not Found / Error
            VIAF-->>AgentsAPI: not found / error
        end
    end
    AgentsCLI-->>User: summary (processed, cleaned)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Suggested reviewers

  • Garfield-fr
  • PascalRepond

Poem

🐰 Hop, hop — the links find home at last,

Old clusters merge and redirects pass,
Unlinked friends retrace their way,
Tasks refresh through night and day,
DDC whispers from MARC fields vast.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly describes the main feature: adding a VIAF cluster refresh pipeline. It directly reflects the primary objective of implementing VIAF updates through a new CLI command.
Docstring Coverage ✅ Passed Docstring coverage is 87.79% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rero_mef/places/utils.py (1)

30-34: ⚠️ Potential issue | 🟠 Major

This formatter change is likely not used by the production place-indexing path.

rero_mef/places/listener.py calls make_identifier from rero_mef/utils.py, so updating rero_mef/places/utils.py here does not change the runtime _identifier enrichment flow. This leaves duplicated identifier logic and makes future divergence likely. Please consolidate to a single source of truth (prefer rero_mef/utils.py) and update call sites/imports accordingly.

Proposed consolidation (example)
-from flask import current_app
+from flask import current_app
+from rero_mef.utils import make_identifier as shared_make_identifier
@@
 def make_identifier(identified_by):
-    """Make identifier `type|(source)value`."""
-    if source := identified_by.get("source"):
-        return f"{identified_by['type']}|({source}){identified_by['value']}"
-    return f"{identified_by['type']}|{identified_by['value']}"
+    """Proxy to shared identifier formatter to avoid drift."""
+    return shared_make_identifier(identified_by)

As per coding guidelines, for **/*.py we should focus on “correctness issues” and “logic errors and architectural improvements.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/places/utils.py` around lines 30 - 34, The make_identifier
implementation in rero_mef/places/utils.py is duplicating logic already used by
the runtime path (listener) which imports make_identifier from
rero_mef/utils.py; remove the duplicate function from rero_mef/places/utils.py
and consolidate to a single source of truth by using the implementation in
rero_mef/utils.py (prefer rero_mef/utils.py as canonical), update any call sites
and imports (e.g., rero_mef/places/listener.py and any other modules) to import
make_identifier from rero_mef.utils, and run/update related tests to ensure
there are no import or behavior regressions.
🧹 Nitpick comments (5)
rero_mef/agents/viaf/tasks.py (1)

208-237: progress parameter is accepted but not forwarded.

The viaf_get_record task accepts a progress parameter (line 214) but doesn't pass it to _refresh_viaf_record. Since _refresh_viaf_record doesn't accept a progress parameter, this is unused. Consider removing it from the signature for clarity.

♻️ Remove unused `progress` parameter
 `@shared_task`
 def viaf_get_record(
     viaf_pid,
     dbcommit=True,
     reindex=True,
     verbose=False,
-    progress=False,
     delete_if_not_found=False,
     update_agents=False,
 ):
     """Fetch and update a single VIAF record from the online API.

     :param viaf_pid: VIAF PID to fetch.
     :param dbcommit: Commit changes to DB.
     :param reindex: Reindex record.
     :param verbose: Print verbose messages.
-    :param progress: Display progress information.
     :param delete_if_not_found: Delete old record if redirect target not found.
     :param update_agents: Update MEF and agent records (GND/IdRef) after VIAF update.
     :returns: Action performed.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 208 - 237, The viaf_get_record
task declares a unused progress parameter that is not forwarded to
_refresh_viaf_record; remove the progress parameter from the viaf_get_record
signature, delete its mention from the docstring and default args, and update
any callers that pass progress to stop providing it (or adjust callers
accordingly); locate the viaf_get_record function and its docstring to make
these edits and ensure behavior remains unchanged.
rero_mef/agents/cli.py (1)

159-160: Consider catching more specific exceptions.

The broad except Exception can mask unexpected errors. Consider catching requests.RequestException or the specific exceptions that get_online_record might raise.

♻️ Suggested narrower exception handling
-                except Exception as err:
-                    return mef_pid, entity_pid, {}, f"Error: {err}"
+                except (requests.RequestException, TimeoutError) as err:
+                    return mef_pid, entity_pid, {}, f"Error: {err}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 159 - 160, The broad except in the block
returning (mef_pid, entity_pid, {}, f"Error: {err}") should be narrowed — catch
requests.RequestException (for network/HTTP errors from get_online_record) and
any specific parsing errors that get_online_record may raise (e.g., ValueError,
KeyError) instead of Exception; update the except clause(s) around
get_online_record in the function where mef_pid and entity_pid are handled to
handle these specific exceptions and preserve the existing return shape and
error message.
rero_mef/cli.py (1)

1102-1136: Acceptable defensive exception handling.

The broad except Exception catches in rabbitmq_queue_count are appropriate here since:

  1. The function is used for monitoring/waiting, not critical operations
  2. Returning 0 on failure is a safe fallback that allows processing to continue
  3. Various broker-related exceptions could occur (connection issues, auth failures, etc.)

However, consider logging these exceptions at debug level to aid troubleshooting.

♻️ Optional: add debug logging for exceptions
+import logging
+
+logger = logging.getLogger(__name__)
+
 def rabbitmq_queue_count():
     ...
                     except Exception:
+                        logger.debug("Failed to declare queue %s", queue_name, exc_info=True)
                         continue
             finally:
                 channel.close()
         return count
     except Exception:
+        logger.debug("Failed to get RabbitMQ queue count", exc_info=True)
         return 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/cli.py` around lines 1102 - 1136, Add debug-level logging for
exceptions in rabbitmq_queue_count: create or use a module logger
(logging.getLogger(__name__)) and log the inner exceptions raised by
channel.queue_declare (the inner except Exception) with context (queue_name) and
exc_info=True, and also log the outer broad except Exception that returns 0 with
exc_info=True so broker/connection failures
(current_celery.connection_or_acquire / connection.channel) are recorded for
troubleshooting; keep behavior unchanged otherwise.
rero_mef/agents/viaf/api.py (1)

579-585: Suppress unused action variable.

Ruff flags the unpacked action variable as unused. Rename to _action to indicate it's intentionally discarded.

♻️ Prefix with underscore
         try:
-            new_record, action = self.create_or_update(
+            new_record, _action = self.create_or_update(
                 data=target_data,
                 dbcommit=dbcommit,
                 reindex=reindex,
                 test_md5=True,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 579 - 585, The unpacked variable
action from the call to self.create_or_update in the viaf API is unused and
flagged by Ruff; update the unpacking to use a throwaway name (e.g., change
"new_record, action = self.create_or_update(...)" to "new_record, _action =
self.create_or_update(...)" or similar) in the same scope so the intent to
discard the second return value is clear and linter warnings are suppressed
while leaving create_or_update unchanged.
rero_mef/agents/mef/api.py (1)

25-25: Import cleanup correctly removes unused dependencies. The click and progressbar imports were only used by the removed get_all_missing_viaf_pids method, and their removal is appropriate.

The verification found that get_all_missing_viaf_pids now exists only as a module-level function in rero_mef/agents/api.py with no call sites in the codebase. Consider removing this dead code if it is no longer needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/mef/api.py` at line 25, The module-level function
get_all_missing_viaf_pids in rero_mef/agents/api.py appears to be dead code (no
call sites) after removing click/progressbar usage; locate the
get_all_missing_viaf_pids definition in that file and remove it (or move it to a
utilities/archive location if you want to keep it for reference), and run a
project-wide search for get_all_missing_viaf_pids to confirm there are no
remaining callers before committing the removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/marctojson/do_gnd_concepts.py`:
- Around line 216-218: The logger.info call uses multiple positional arguments
without a format placeholder which causes a TypeError when handlers perform
string interpolation; for each occurrence (e.g., the call guarded by self.logger
and self.verbose in do_gnd_concepts.py around the trans_gnd_classification
invocation), change the call to either include a format placeholder (e.g., use
"%s" and pass the function name as the second argument) or build a single
formatted message string before calling self.logger.info, and apply the same fix
to the other instances across marctojson (do_idref_concepts.py,
do_rero_concepts.py, do_gnd_places.py, do_idref_places.py, do_gnd_agent.py,
do_idref_agent.py, do_rero_agent.py, do_skeleton.py) where self.logger.info is
called with multiple args but no formatting placeholder.

In `@rero_mef/utils.py`:
- Around line 963-964: The current check in utils.py that raises KeyError("pid")
immediately when record.get("pid") is falsy should be changed to first attempt
normalization from the legacy viaf_pid field: if record lacks "pid" but has
"viaf_pid", set/normalize record["pid"] from record["viaf_pid"] (using the same
normalization logic already present elsewhere in the function) and only raise
KeyError("pid") if both record.get("pid") and record.get("viaf_pid") are absent;
update the branch around the existing KeyError to perform this fallback
normalization before raising.

---

Outside diff comments:
In `@rero_mef/places/utils.py`:
- Around line 30-34: The make_identifier implementation in
rero_mef/places/utils.py is duplicating logic already used by the runtime path
(listener) which imports make_identifier from rero_mef/utils.py; remove the
duplicate function from rero_mef/places/utils.py and consolidate to a single
source of truth by using the implementation in rero_mef/utils.py (prefer
rero_mef/utils.py as canonical), update any call sites and imports (e.g.,
rero_mef/places/listener.py and any other modules) to import make_identifier
from rero_mef.utils, and run/update related tests to ensure there are no import
or behavior regressions.

---

Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 159-160: The broad except in the block returning (mef_pid,
entity_pid, {}, f"Error: {err}") should be narrowed — catch
requests.RequestException (for network/HTTP errors from get_online_record) and
any specific parsing errors that get_online_record may raise (e.g., ValueError,
KeyError) instead of Exception; update the except clause(s) around
get_online_record in the function where mef_pid and entity_pid are handled to
handle these specific exceptions and preserve the existing return shape and
error message.

In `@rero_mef/agents/mef/api.py`:
- Line 25: The module-level function get_all_missing_viaf_pids in
rero_mef/agents/api.py appears to be dead code (no call sites) after removing
click/progressbar usage; locate the get_all_missing_viaf_pids definition in that
file and remove it (or move it to a utilities/archive location if you want to
keep it for reference), and run a project-wide search for
get_all_missing_viaf_pids to confirm there are no remaining callers before
committing the removal.

In `@rero_mef/agents/viaf/api.py`:
- Around line 579-585: The unpacked variable action from the call to
self.create_or_update in the viaf API is unused and flagged by Ruff; update the
unpacking to use a throwaway name (e.g., change "new_record, action =
self.create_or_update(...)" to "new_record, _action =
self.create_or_update(...)" or similar) in the same scope so the intent to
discard the second return value is clear and linter warnings are suppressed
while leaving create_or_update unchanged.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 208-237: The viaf_get_record task declares a unused progress
parameter that is not forwarded to _refresh_viaf_record; remove the progress
parameter from the viaf_get_record signature, delete its mention from the
docstring and default args, and update any callers that pass progress to stop
providing it (or adjust callers accordingly); locate the viaf_get_record
function and its docstring to make these edits and ensure behavior remains
unchanged.

In `@rero_mef/cli.py`:
- Around line 1102-1136: Add debug-level logging for exceptions in
rabbitmq_queue_count: create or use a module logger
(logging.getLogger(__name__)) and log the inner exceptions raised by
channel.queue_declare (the inner except Exception) with context (queue_name) and
exc_info=True, and also log the outer broad except Exception that returns 0 with
exc_info=True so broker/connection failures
(current_celery.connection_or_acquire / connection.channel) are recorded for
troubleshooting; keep behavior unchanged otherwise.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 436e8084-d13d-406a-92b9-a9699193aae5

📥 Commits

Reviewing files that changed from the base of the PR and between 788790b and 37a79e3.

⛔ Files ignored due to path filters (29)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • data/aggnd.json is excluded by none and included by none
  • data/aggnd_metadata.csv is excluded by !**/*.csv and included by none
  • data/aggnd_pidstore.csv is excluded by !**/*.csv and included by none
  • data/agrero.json is excluded by none and included by none
  • data/agrero_metadata.csv is excluded by !**/*.csv and included by none
  • data/agrero_pidstore.csv is excluded by !**/*.csv and included by none
  • data/aidref.json is excluded by none and included by none
  • data/aidref_metadata.csv is excluded by !**/*.csv and included by none
  • data/aidref_pidstore.csv is excluded by !**/*.csv and included by none
  • data/cidref.json is excluded by none and included by none
  • data/mef.json is excluded by none and included by none
  • data/mef_id.csv is excluded by !**/*.csv and included by none
  • data/mef_metadata.csv is excluded by !**/*.csv and included by none
  • data/mef_pidstore.csv is excluded by !**/*.csv and included by none
  • data/viaf.json is excluded by none and included by none
  • data/viaf_metadata.csv is excluded by !**/*.csv and included by none
  • data/viaf_pidstore.csv is excluded by !**/*.csv and included by none
  • docker-services.yml is excluded by none and included by none
  • docker/uwsgi/uwsgi_rest.ini is excluded by none and included by none
  • docker/uwsgi/uwsgi_ui.ini is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • scripts/test is excluded by none and included by none
  • tests/unit/concepts/examples/xml_minimal_record.xml is excluded by none and included by none
  • tests/unit/places/examples/xml_minimal_record.xml is excluded by none and included by none
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (27)
  • rero_mef/agents/api.py
  • rero_mef/agents/cli.py
  • rero_mef/agents/mef/api.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/api_mef.py
  • rero_mef/cli.py
  • rero_mef/cli_logging.py
  • rero_mef/marctojson/do_gnd_concepts.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • rero_mef/places/utils.py
  • rero_mef/utils.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • tests/ui/test_api.py
  • tests/ui/test_cli.py
  • tests/ui/test_views.py
  • tests/unit/concepts/test_concepts_gnd_transformation.py
  • tests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
  • rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (3)
  • tests/ui/test_views.py
  • tests/ui/places/test_places_api.py
  • tests/unit/places/test_places_gnd_transformation.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • rero_mef/api.py
  • rero_mef/places/idref/api.py
  • rero_mef/monitoring/cli.py
  • tests/ui/test_cli.py
  • tests/ui/agents/test_agents_cli.py
  • rero_mef/agents/api.py

@rerowep rerowep force-pushed the wep-viaf-updates branch 3 times, most recently from 7ec3a80 to 49b4ccb Compare March 26, 2026 08:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
rero_mef/agents/cli.py (1)

177-179: ⚠️ Potential issue | 🟠 Major

Compute the VIAF md5 before create_or_update() in --unlinked mode.

AgentViafRecord.create_or_update() does not add md5 itself. Records created here will miss md5, so the next refresh with test_md5=True cannot recognize them as up to date.

Suggested fix
+            data = add_md5(data)
             viaf_record, _ = AgentViafRecord.create_or_update(
                 data=data, dbcommit=True, reindex=True
             )

Also add add_md5 to the ..utils import list.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 177 - 179, When running in --unlinked
mode ensure the VIAF md5 is computed and added to the record before calling
AgentViafRecord.create_or_update: call the add_md5 utility (import it from
..utils if not already imported) on the VIAF data payload to set the md5 field,
then pass that augmented data into AgentViafRecord.create_or_update(data=...,
dbcommit=True, reindex=True) so newly created records include md5 and can be
recognized by later test_md5 checks.
rero_mef/agents/viaf/tasks.py (1)

27-34: ⚠️ Potential issue | 🟠 Major

update_agents is still a no-op in the normal refresh path.

This helper exposes update_agents as optional work, but the standard branch always calls AgentViafRecord.create_or_update(), and that method unconditionally runs create_mef_and_agents(...). So update_agents=False still performs the expensive MEF/agent side effect, which makes the task/CLI flag misleading.

Also applies to: 75-81

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 34, The refresh helper
_refresh_viaf_record currently accepts update_agents but still triggers
agent/MEF creation because it always calls AgentViafRecord.create_or_update(),
which itself unconditionally calls create_mef_and_agents; change the flow so the
update_agents flag is respected—either pass update_agents through to
AgentViafRecord.create_or_update (and update that method signature to
conditionally call create_mef_and_agents) or, if altering create_or_update is
undesirable, add a conditional in _refresh_viaf_record to skip the
create_mef_and_agents path when update_agents is False; apply the same fix to
the other refresh call site noted (the similar create_or_update invocation at
the 75-81 region) so the CLI/task flag behavior is not misleading.
🧹 Nitpick comments (2)
tests/unit/concepts/test_concepts_gnd_transformation.py (1)

139-165: Add a negative-path assertion for non-DDC/empty classification behavior.

Please add a companion test to confirm non-DDC 686 values are ignored and that classification is not set when no valid entry exists. This guards the edge-case branch described by the transformation behavior.

Proposed test addition
+def test_gnd_classification_ignores_non_ddc():
+    """Test trans_gnd_classification ignores non-DDC entries."""
+    xml_part_to_add = """
+        <datafield tag="686" ind1=" " ind2=" ">
+            <subfield code="a">999</subfield>
+            <subfield code="2">OTHER</subfield>
+        </datafield>
+    """
+    trans = trans_prep(Transformation, "concepts", xml_part_to_add)
+    trans.trans_gnd_classification()
+    assert "classification" not in trans.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/concepts/test_concepts_gnd_transformation.py` around lines 139 -
165, Add a negative-path test that uses trans_prep(Transformation, "concepts",
xml_part) with 686 datafields that are non-DDC or empty (e.g., <subfield
code="2">LCC</subfield> or missing/empty <subfield code="2">) and then call
trans.trans_gnd_classification(); assert that trans.json does not contain a
"classification" key (or equals {}), and also add a separate case where mixed
686 entries include only non-DDC entries to confirm they are ignored. Use the
existing test structure and the same helper symbols (trans_prep, Transformation,
trans.trans_gnd_classification) so the new test mirrors the positive case but
asserts absence of classification for invalid/non-DDC inputs.
rero_mef/marctojson/do_idref_agent.py (1)

416-416: Align logged function name with actual method name.

Line 416 logs trans_authorized_access_point, but the method is trans_idref_authorized_access_point. This can mislead debugging/trace filtering.

Suggested patch
-            self.logger.info("Call Function: %s", "trans_authorized_access_point")
+            self.logger.info("Call Function: %s", "trans_idref_authorized_access_point")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/marctojson/do_idref_agent.py` at line 416, The log message currently
prints "trans_authorized_access_point" but the actual method is
trans_idref_authorized_access_point; update the logger.info call in
trans_idref_authorized_access_point to log the correct function name (e.g.,
"trans_idref_authorized_access_point") so trace messages match the real method,
or derive the name dynamically (e.g., using the method's __name__ or f-string
with the function identifier) to avoid future drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 504-509: The current VIAF fetch returns a truthy payload for
merged clusters (result contains a different pid and redirect_from), which
causes update_online() to treat it as a normal replacement; change the behavior
so merged-cluster redirects are treated as falsy results: when result.get("pid")
is present (i.e., a redirect), set result["redirect_from"]=pid but return None
(or another falsy payload) instead of the truthy result so update_online() will
invoke handle_redirect() and perform deletion/reconciliation; update any callers
(e.g., update_online(), handle_redirect()) to read the redirect info from the
log/message or from a standardized redirect flag if you choose that approach.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 122-123: The code is materializing the entire scan by calling
list(query.scan()) which loads all VIAF results into memory; instead pass the
scan iterator directly to the progressbar (e.g., use
progressbar(items=query.scan()) or items=iter(query.scan())) or use a
streaming/batched iterator so items are consumed lazily; update the call that
constructs progress_bar (the progressbar(...) invocation that currently wraps
list(query.scan())) to use the iterator returned by query.scan() and ensure any
downstream code supports incremental consumption (respecting batch_size and not
assuming a pre-sized list).

In `@rero_mef/cli.py`:
- Around line 1124-1130: The code that inspects the broker currently swallows
all exceptions and returns 0 (the block around channel.close()/return count
inside the try/except), which makes wait_empty_tasks() treat errors as an empty
queue; change the error handling so exceptions are not collapsed to 0: in the
function that computes the queue count (the try/except that currently returns 0
on any Exception), catch only expected exceptions if needed, log the full
exception (use the existing logger) and either re-raise the exception or return
a sentinel (e.g. None) instead of 0; then update wait_empty_tasks() to detect
that sentinel (or handle exceptions) and retry with backoff or fail with a clear
warning instead of assuming the queue is empty. Ensure channel.close() remains
in the finally block and reference the functions/methods wait_empty_tasks() and
the broker-inspection function in your edits.

In `@rero_mef/marctojson/do_skeleton.py`:
- Line 69: The log call in Transformation.trans_example_2 incorrectly logs
"trans_example_1"; update the logger.info invocation inside the trans_example_2
method to reference the correct function name (e.g., "trans_example_2") so the
message accurately reflects the executing method (locate the logger.info("Call
Function: %s", "trans_example_1") line in trans_example_2 and change the literal
to "trans_example_2").

In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 128-145: The tests patch AgentViafSearch.point_in_time but
production code now iterates using query.scan(), so update the mocks to
intercept the actual iterator: patch the scan() method on the
AgentViafSearch.query (e.g.,
mock.patch("rero_mef.agents.viaf.tasks.AgentViafSearch.query.scan",
return_value=iter([hit]), create=True) or create a fake Query object whose scan
returns the iterator) and adjust the ctx/point-in-time mocks accordingly; apply
the same change to the other test block referenced (the one around lines
241-252) so process_viaf_rolling_refresh() is exercised via the scan() iterator
rather than the old point_in_time path.

---

Duplicate comments:
In `@rero_mef/agents/cli.py`:
- Around line 177-179: When running in --unlinked mode ensure the VIAF md5 is
computed and added to the record before calling
AgentViafRecord.create_or_update: call the add_md5 utility (import it from
..utils if not already imported) on the VIAF data payload to set the md5 field,
then pass that augmented data into AgentViafRecord.create_or_update(data=...,
dbcommit=True, reindex=True) so newly created records include md5 and can be
recognized by later test_md5 checks.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 27-34: The refresh helper _refresh_viaf_record currently accepts
update_agents but still triggers agent/MEF creation because it always calls
AgentViafRecord.create_or_update(), which itself unconditionally calls
create_mef_and_agents; change the flow so the update_agents flag is
respected—either pass update_agents through to AgentViafRecord.create_or_update
(and update that method signature to conditionally call create_mef_and_agents)
or, if altering create_or_update is undesirable, add a conditional in
_refresh_viaf_record to skip the create_mef_and_agents path when update_agents
is False; apply the same fix to the other refresh call site noted (the similar
create_or_update invocation at the 75-81 region) so the CLI/task flag behavior
is not misleading.

---

Nitpick comments:
In `@rero_mef/marctojson/do_idref_agent.py`:
- Line 416: The log message currently prints "trans_authorized_access_point" but
the actual method is trans_idref_authorized_access_point; update the logger.info
call in trans_idref_authorized_access_point to log the correct function name
(e.g., "trans_idref_authorized_access_point") so trace messages match the real
method, or derive the name dynamically (e.g., using the method's __name__ or
f-string with the function identifier) to avoid future drift.

In `@tests/unit/concepts/test_concepts_gnd_transformation.py`:
- Around line 139-165: Add a negative-path test that uses
trans_prep(Transformation, "concepts", xml_part) with 686 datafields that are
non-DDC or empty (e.g., <subfield code="2">LCC</subfield> or missing/empty
<subfield code="2">) and then call trans.trans_gnd_classification(); assert that
trans.json does not contain a "classification" key (or equals {}), and also add
a separate case where mixed 686 entries include only non-DDC entries to confirm
they are ignored. Use the existing test structure and the same helper symbols
(trans_prep, Transformation, trans.trans_gnd_classification) so the new test
mirrors the positive case but asserts absence of classification for
invalid/non-DDC inputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4ee2d590-cb2b-4d34-85aa-08c6e6486fab

📥 Commits

Reviewing files that changed from the base of the PR and between 37a79e3 and 49b4ccb.

⛔ Files ignored due to path filters (29)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • data/aggnd.json is excluded by none and included by none
  • data/aggnd_metadata.csv is excluded by !**/*.csv and included by none
  • data/aggnd_pidstore.csv is excluded by !**/*.csv and included by none
  • data/agrero.json is excluded by none and included by none
  • data/agrero_metadata.csv is excluded by !**/*.csv and included by none
  • data/agrero_pidstore.csv is excluded by !**/*.csv and included by none
  • data/aidref.json is excluded by none and included by none
  • data/aidref_metadata.csv is excluded by !**/*.csv and included by none
  • data/aidref_pidstore.csv is excluded by !**/*.csv and included by none
  • data/cidref.json is excluded by none and included by none
  • data/mef.json is excluded by none and included by none
  • data/mef_id.csv is excluded by !**/*.csv and included by none
  • data/mef_metadata.csv is excluded by !**/*.csv and included by none
  • data/mef_pidstore.csv is excluded by !**/*.csv and included by none
  • data/viaf.json is excluded by none and included by none
  • data/viaf_metadata.csv is excluded by !**/*.csv and included by none
  • data/viaf_pidstore.csv is excluded by !**/*.csv and included by none
  • docker-services.yml is excluded by none and included by none
  • docker/uwsgi/uwsgi_rest.ini is excluded by none and included by none
  • docker/uwsgi/uwsgi_ui.ini is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • scripts/test is excluded by none and included by none
  • tests/unit/concepts/examples/xml_minimal_record.xml is excluded by none and included by none
  • tests/unit/places/examples/xml_minimal_record.xml is excluded by none and included by none
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (34)
  • rero_mef/agents/api.py
  • rero_mef/agents/cli.py
  • rero_mef/agents/mef/api.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/api_mef.py
  • rero_mef/cli.py
  • rero_mef/cli_logging.py
  • rero_mef/marctojson/do_gnd_agent.py
  • rero_mef/marctojson/do_gnd_concepts.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/marctojson/do_idref_agent.py
  • rero_mef/marctojson/do_idref_concepts.py
  • rero_mef/marctojson/do_idref_places.py
  • rero_mef/marctojson/do_rero_agent.py
  • rero_mef/marctojson/do_rero_concepts.py
  • rero_mef/marctojson/do_skeleton.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • rero_mef/places/utils.py
  • rero_mef/utils.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • tests/ui/test_api.py
  • tests/ui/test_cli.py
  • tests/ui/test_views.py
  • tests/unit/concepts/test_concepts_gnd_transformation.py
  • tests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
  • rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (9)
  • rero_mef/api.py
  • tests/ui/test_views.py
  • rero_mef/places/utils.py
  • rero_mef/marctojson/do_rero_concepts.py
  • rero_mef/marctojson/do_idref_places.py
  • tests/unit/places/test_places_gnd_transformation.py
  • rero_mef/marctojson/do_rero_agent.py
  • tests/ui/places/test_places_gnd_tasks.py
  • rero_mef/marctojson/do_gnd_agent.py
🚧 Files skipped from review as they are similar to previous changes (9)
  • tests/ui/test_api.py
  • tests/ui/places/test_places_api.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/cli_logging.py
  • rero_mef/marctojson/do_gnd_concepts.py
  • rero_mef/places/idref/api.py
  • rero_mef/monitoring/cli.py
  • tests/ui/test_cli.py
  • rero_mef/agents/api.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (6)
rero_mef/cli.py (1)

1109-1130: ⚠️ Potential issue | 🟠 Major

Queue-inspection failures currently look like "all clear".

Line 1124 and Line 1129 collapse broker errors to 0, so wait_empty_tasks() can exit while RabbitMQ is unavailable or a passive declare failed. Please return a distinct failure state, or raise/log and retry, instead of reusing the empty-queue sentinel.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/cli.py` around lines 1109 - 1130, The code that counts messages
(using current_celery, current_celery.connection_or_acquire,
channel.queue_declare) is swallowing broker errors and returning 0, which
conflates "empty" with "failure"; change the error handling so exceptions from
channel.queue_declare or the outer try are not mapped to the empty-queue
sentinel: log the exception via the existing logger and re-raise it (or return a
distinct failure value like -1) instead of returning 0; ensure channel.close()
still runs in the finally block and preserve the existing success path that
returns the positive count.
rero_mef/agents/viaf/api.py (1)

504-509: ⚠️ Potential issue | 🟠 Major

Merged-cluster redirects can still bypass handle_redirect().

This branch returns a normal truthy payload plus redirect_from. update_online() below only checks truthiness, so it will take the ordinary replace path instead of deleting the old VIAF PID and reconciling MEF links through the redirect flow.

Minimal fix
     online_data, _ = self.get_online_record(viaf_source_code="VIAF", pid=self.pid)
     if online_data:
+        if online_data.get("redirect_from"):
+            new_record, action, _ = self.handle_redirect(
+                online_data["pid"], dbcommit=dbcommit, reindex=reindex
+            )
+            return new_record, action
         online_data["$schema"] = self["$schema"]
         online_data = add_md5(online_data)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 504 - 509, The current branch
inside the VIAF fetch returns a normal truthy payload when a merged-cluster
redirect is detected (result.get("pid")), which causes update_online() to treat
it as a regular replacement instead of running the redirect flow; change this
branch to invoke the redirect handling path instead of returning the normal
result: call the existing handle_redirect(...) (or the equivalent redirect
helper used elsewhere) with the original pid and the target pid (result["pid"])
and return its result/tuple so update_online() executes the
redirect-delete/reconcile logic, or alternatively make the returned value falsy
and surface the redirect via a dedicated redirect result flag that
update_online() checks; locate this logic around the result, pid,
handle_redirect, and update_online symbols and update the branch to route
merged-cluster redirects through handle_redirect.
rero_mef/agents/viaf/tasks.py (3)

27-33: ⚠️ Potential issue | 🟠 Major

update_agents never affects this refresh path.

The flag is exposed by the CLI and both task entry points, but _refresh_viaf_record() never uses it. It is not forwarded to handle_redirect(...) or create_or_update(...), so callers get the same behavior regardless of True/False. Either pass it through to the underlying API or remove the option until that behavior exists.

Also applies to: 60-65, 76-81

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 33, The update_agents
parameter on _refresh_viaf_record currently has no effect; update_agents should
be forwarded to the downstream handlers (or removed). Modify
_refresh_viaf_record to pass update_agents into handle_redirect(...) and
create_or_update(...) calls (and any other internal call that applies) so the
flag controls whether linked local agents are updated; alternatively, if the
feature isn't implemented yet, remove the update_agents parameter from the
function signature and its callers (CLI/task entry points) to avoid a no-op
flag. Ensure you update all occurrences referenced in this file (including the
other two similar regions) so the behavior is consistent.

122-123: ⚠️ Potential issue | 🟠 Major

This still preloads the full VIAF scan before processing the first record.

list(query.scan()) makes small batches pay the full O(N) scan and memory cost up front. Even batch_size=1 will materialize every PID before Line 128 runs. Use a lazy, bounded iterator instead of buffering the whole scan.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 122 - 123, The code currently
materializes the entire VIAF scan by calling list(query.scan()) when creating
progress_bar; replace that with a lazy iterator to avoid preloading all PIDs —
e.g. pass query.scan() (or a generator that yields from query.scan() or an
itertools.islice wrapper for bounded consumption) directly into progressbar
instead of list(query.scan()), ensuring any required max/total param is set
separately so progressbar and the later loop in tasks.py use a streaming
iterator rather than buffering the whole result.

180-190: ⚠️ Potential issue | 🟠 Major

scan() is not a safe oldest-first iterator for the rolling batch.

process_viaf_rolling_refresh() promises oldest-first processing, but Search.scan() does not reliably preserve the _updated sort, and breaking after batch_size can leave the scroll open. Use PIT/iterate() (or another bounded ordered search) for this path instead of scan().

In elasticsearch-dsl for Python, does Search.scan() preserve an explicit .sort(...) order by default, and what is the recommended way to iterate a sorted batch without leaving an open scroll when stopping early?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 180 - 190, The current loop uses
AgentViafSearch().sort(...).scan() which does not guarantee preserving the
explicit _updated sort and will leave an open scroll when you break early;
change process_viaf_rolling_refresh to iterate a bounded, ordered search instead
(e.g. use Search.iterate(...) or open a PIT and use
Search.with_pit()/client.open_point_in_time + iterate) so the sort on "_updated"
is respected and the search context is closed when you stop at batch_size;
ensure the new flow uses AgentViafSearch().sort({"_updated": {"order":"asc"}})
with iterate()/PIT and explicitly close the PIT/iterator when breaking out of
the progressbar loop.
rero_mef/agents/cli.py (1)

351-352: ⚠️ Potential issue | 🟠 Major

--rolling --batch_size 0 still does not mean "all".

This normalization works for full refresh, but process_viaf_rolling_refresh() interprets None as "use RERO_MEF_VIAF_REFRESH_BATCH_SIZE". In rolling mode the CLI help says 0=all, but the code silently falls back to the default batch instead. Normalize per mode or teach the task to treat 0 as all.

Also applies to: 357-358, 366-367

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 351 - 352, The CLI normalization of
batch_size wrongly converts 0 to None globally, which makes --rolling
--batch_size 0 fall back to the env default; change the normalization to be
mode-aware: when rolling mode is active (the flag used to call
process_viaf_rolling_refresh), preserve 0 as the explicit "all" indicator (set
batch_arg = 0 if batch_size == 0 else batch_size), while for full refresh keep
the existing behavior (batch_arg = None if batch_size == 0 else batch_size) so
process_viaf_rolling_refresh sees 0 as "all" and process_viaf_full_refresh still
uses None to mean env-default; update the code around batch_arg, batch_size, and
the call sites process_viaf_rolling_refresh/process_viaf_full_refresh
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/cli.py`:
- Line 123: The local dict non_existing_pids is never populated so
_clean_non_existing_viaf_links() is never invoked with real data; update the
synchronous and enqueue branches of the function using
task_create_mef_and_agents_from_viaf(...) to return or yield the missing-PID
info and merge it into non_existing_pids before the final cleanup block—i.e.,
change the synchronous path to capture the function's return (e.g., result =
task_create_mef_and_agents_from_viaf(...)) and merge result into
non_existing_pids, and in the enqueue/asynchronous path have the enqueued task
or its callback post back the missing-PID entries into non_existing_pids (or
collect them from the queue response) so that
_clean_non_existing_viaf_links(non_existing_pids) runs with actual data.
- Around line 150-160: The _fetch_viaf function currently catches all Exception
and returns an empty data dict so caller cannot tell lookup misses from real
errors; change _fetch_viaf to only catch expected lookup/miss exceptions (e.g.,
a custom LookupError/RecordNotFound or network-specific exceptions thrown by
AgentViafRecord.get_online_record) and let other exceptions propagate (or return
a distinct error indicator/message instead of {}), so the caller can
differentiate "no VIAF found" from real failures; apply the same change to the
analogous fetch block at lines 169-175 so both use specific exception types or a
distinct error tuple to surface API/logic failures (and preserve existing
verbose logging path when --online_verbose is used).

In `@rero_mef/agents/viaf/api.py`:
- Around line 404-428: The code currently returns the empty-record sentinel ({})
on transient failures (429, TimeoutError, RequestException) which conflates
retryable outages with genuine "no record" results; change the VIAF fetch logic
around _perform_request / the 429 handling and exception handlers so that after
a 429 you wait then retry the request (call _perform_request again once or in a
short retry loop) and on repeated 429/Timeout/RequestException propagate a
distinct retryable failure instead of returning {} (e.g., raise a new
RetryableVIAFError or re-raise the original exception) so callers (including
those using delete_if_not_found) can distinguish transient errors from true
not-found results; keep use of _sleep_with_countdown and ensure the unique
symbols _perform_request, _sleep_with_countdown and the 429 branch are the
places you modify.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 45-54: The current code treats any falsy online_data from
AgentViafRecord.get_online_record(...) as Action.NOT_FOUND; instead
differentiate transient fetch failures from true missing records by inspecting
the returned msg (and/or any additional status flag from get_online_record).
Update the branch that checks if not online_data to: if msg indicates a rate
limit/upstream error (e.g., contains "rate limit", "upstream", "error", or use
an explicit status returned by AgentViafRecord.get_online_record if available),
return Action.NOT_ONLINE; otherwise treat it as a confirmed missing VIAF and
return Action.NOT_FOUND. Ensure you still emit verbose click.echo(msg) when
verbose is true and reference the variables online_data, msg,
AgentViafRecord.get_online_record, Action.NOT_FOUND, and Action.NOT_ONLINE.

In `@rero_mef/cli.py`:
- Around line 1312-1323: After reconciling entity PIDs with
create_or_update_mef(reindex=True) in the loop using get_entity_class(), call
flush_indexes() (or the project’s ES flush helper) before invoking
mef_cls.get_all_pids_without_entities_and_viaf() so Elasticsearch reflects the
recent reindexing; place the flush immediately after the loop (after
reconciled_count updates) and before orphan_pids is computed to prevent deleting
just-relinked documents.

In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 61-72: The test mutates Document["@about"] which is ignored by
AgentViafRecord.get_online_record(), so change the fixture to mutate a
normalized field that is persisted (e.g., modify modified_data["sources"] or
modified_data["xLinks"] or another value that maps to AgentViafRecord's stored
fields) so the MD5 differs and the refresh path triggers UPDATE/CREATE; update
the block around modified_data in tests/ui/agents/test_agents_viaf_tasks.py (the
modified_data variable used with _refresh_viaf_record and
agent_viaf_online_response) to change one of the persisted fields (sources or
xLinks) instead of Document["@about"] so the assertion becomes reliable.

---

Duplicate comments:
In `@rero_mef/agents/cli.py`:
- Around line 351-352: The CLI normalization of batch_size wrongly converts 0 to
None globally, which makes --rolling --batch_size 0 fall back to the env
default; change the normalization to be mode-aware: when rolling mode is active
(the flag used to call process_viaf_rolling_refresh), preserve 0 as the explicit
"all" indicator (set batch_arg = 0 if batch_size == 0 else batch_size), while
for full refresh keep the existing behavior (batch_arg = None if batch_size == 0
else batch_size) so process_viaf_rolling_refresh sees 0 as "all" and
process_viaf_full_refresh still uses None to mean env-default; update the code
around batch_arg, batch_size, and the call sites
process_viaf_rolling_refresh/process_viaf_full_refresh accordingly.

In `@rero_mef/agents/viaf/api.py`:
- Around line 504-509: The current branch inside the VIAF fetch returns a normal
truthy payload when a merged-cluster redirect is detected (result.get("pid")),
which causes update_online() to treat it as a regular replacement instead of
running the redirect flow; change this branch to invoke the redirect handling
path instead of returning the normal result: call the existing
handle_redirect(...) (or the equivalent redirect helper used elsewhere) with the
original pid and the target pid (result["pid"]) and return its result/tuple so
update_online() executes the redirect-delete/reconcile logic, or alternatively
make the returned value falsy and surface the redirect via a dedicated redirect
result flag that update_online() checks; locate this logic around the result,
pid, handle_redirect, and update_online symbols and update the branch to route
merged-cluster redirects through handle_redirect.

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 27-33: The update_agents parameter on _refresh_viaf_record
currently has no effect; update_agents should be forwarded to the downstream
handlers (or removed). Modify _refresh_viaf_record to pass update_agents into
handle_redirect(...) and create_or_update(...) calls (and any other internal
call that applies) so the flag controls whether linked local agents are updated;
alternatively, if the feature isn't implemented yet, remove the update_agents
parameter from the function signature and its callers (CLI/task entry points) to
avoid a no-op flag. Ensure you update all occurrences referenced in this file
(including the other two similar regions) so the behavior is consistent.
- Around line 122-123: The code currently materializes the entire VIAF scan by
calling list(query.scan()) when creating progress_bar; replace that with a lazy
iterator to avoid preloading all PIDs — e.g. pass query.scan() (or a generator
that yields from query.scan() or an itertools.islice wrapper for bounded
consumption) directly into progressbar instead of list(query.scan()), ensuring
any required max/total param is set separately so progressbar and the later loop
in tasks.py use a streaming iterator rather than buffering the whole result.
- Around line 180-190: The current loop uses AgentViafSearch().sort(...).scan()
which does not guarantee preserving the explicit _updated sort and will leave an
open scroll when you break early; change process_viaf_rolling_refresh to iterate
a bounded, ordered search instead (e.g. use Search.iterate(...) or open a PIT
and use Search.with_pit()/client.open_point_in_time + iterate) so the sort on
"_updated" is respected and the search context is closed when you stop at
batch_size; ensure the new flow uses AgentViafSearch().sort({"_updated":
{"order":"asc"}}) with iterate()/PIT and explicitly close the PIT/iterator when
breaking out of the progressbar loop.

In `@rero_mef/cli.py`:
- Around line 1109-1130: The code that counts messages (using current_celery,
current_celery.connection_or_acquire, channel.queue_declare) is swallowing
broker errors and returning 0, which conflates "empty" with "failure"; change
the error handling so exceptions from channel.queue_declare or the outer try are
not mapped to the empty-queue sentinel: log the exception via the existing
logger and re-raise it (or return a distinct failure value like -1) instead of
returning 0; ensure channel.close() still runs in the finally block and preserve
the existing success path that returns the positive count.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e95e5e0c-1a25-4e29-aa20-896ad89ff765

📥 Commits

Reviewing files that changed from the base of the PR and between 49b4ccb and 1266764.

⛔ Files ignored due to path filters (27)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • data/aggnd.json is excluded by none and included by none
  • data/aggnd_metadata.csv is excluded by !**/*.csv and included by none
  • data/aggnd_pidstore.csv is excluded by !**/*.csv and included by none
  • data/agrero.json is excluded by none and included by none
  • data/agrero_metadata.csv is excluded by !**/*.csv and included by none
  • data/agrero_pidstore.csv is excluded by !**/*.csv and included by none
  • data/aidref.json is excluded by none and included by none
  • data/aidref_metadata.csv is excluded by !**/*.csv and included by none
  • data/aidref_pidstore.csv is excluded by !**/*.csv and included by none
  • data/cidref.json is excluded by none and included by none
  • data/mef.json is excluded by none and included by none
  • data/mef_id.csv is excluded by !**/*.csv and included by none
  • data/mef_metadata.csv is excluded by !**/*.csv and included by none
  • data/mef_pidstore.csv is excluded by !**/*.csv and included by none
  • data/viaf.json is excluded by none and included by none
  • data/viaf_metadata.csv is excluded by !**/*.csv and included by none
  • data/viaf_pidstore.csv is excluded by !**/*.csv and included by none
  • docker-services.yml is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • scripts/test is excluded by none and included by none
  • tests/unit/concepts/examples/xml_minimal_record.xml is excluded by none and included by none
  • tests/unit/places/examples/xml_minimal_record.xml is excluded by none and included by none
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (34)
  • rero_mef/agents/api.py
  • rero_mef/agents/cli.py
  • rero_mef/agents/mef/api.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/api_mef.py
  • rero_mef/cli.py
  • rero_mef/cli_logging.py
  • rero_mef/marctojson/do_gnd_agent.py
  • rero_mef/marctojson/do_gnd_concepts.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/marctojson/do_idref_agent.py
  • rero_mef/marctojson/do_idref_concepts.py
  • rero_mef/marctojson/do_idref_places.py
  • rero_mef/marctojson/do_rero_agent.py
  • rero_mef/marctojson/do_rero_concepts.py
  • rero_mef/marctojson/do_skeleton.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • rero_mef/places/utils.py
  • rero_mef/utils.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • tests/ui/test_api.py
  • tests/ui/test_cli.py
  • tests/ui/test_views.py
  • tests/unit/concepts/test_concepts_gnd_transformation.py
  • tests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
  • rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (13)
  • tests/ui/test_views.py
  • rero_mef/marctojson/do_idref_places.py
  • rero_mef/marctojson/do_rero_concepts.py
  • rero_mef/places/utils.py
  • rero_mef/marctojson/do_idref_agent.py
  • rero_mef/api.py
  • tests/unit/places/test_places_gnd_transformation.py
  • rero_mef/marctojson/do_rero_agent.py
  • rero_mef/marctojson/do_idref_concepts.py
  • rero_mef/places/idref/api.py
  • tests/ui/places/test_places_api.py
  • rero_mef/marctojson/do_gnd_agent.py
  • rero_mef/agents/api.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • rero_mef/marctojson/do_skeleton.py
  • tests/ui/test_api.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/marctojson/do_gnd_concepts.py
  • tests/ui/agents/test_agents_cli.py
  • rero_mef/monitoring/cli.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
rero_mef/agents/viaf/api.py (2)

637-658: ⚠️ Potential issue | 🔴 Critical

Validate the redirect target before deleting the source record.

Line 656 is only protected against exceptions. If create_or_update() returns no record on a handled failure, the old VIAF record is still deleted and Line 658 reports Action.REDIRECT.

🛠️ Suggested fix
         try:
             new_record, action = self.create_or_update(
                 data=target_data,
                 dbcommit=dbcommit,
                 reindex=reindex,
                 test_md5=True,
             )
         except Exception as e:
             current_app.logger.exception(
                 f"Failed to create/update target VIAF {redirect_to_pid}: {e}"
             )
             if delete_if_not_found:
                 current_app.logger.info(
                     f"Deleting old VIAF record {old_pid} due to target creation error"
                 )
                 self.delete(force=True, dbcommit=dbcommit, delindex=reindex)
             return None, Action.ERROR, redirect_info
+
+        if not new_record:
+            current_app.logger.error(
+                f"Failed to create/update target VIAF {redirect_to_pid}: {action}"
+            )
+            return None, Action.ERROR, redirect_info
 
         # Delete old VIAF record (handles MEF cleanup) only after successful create/update
         self.delete(force=True, dbcommit=dbcommit, delindex=reindex)

As per coding guidelines, focus on “Security vulnerabilities and correctness issues” and “Logic errors and architectural improvements.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 637 - 658, After calling
create_or_update (in the block that sets new_record, action), validate that
new_record is not None and that action indicates a successful creation/update
(not Action.ERROR) before calling self.delete or returning Action.REDIRECT; if
create_or_update returned no record or an error, respect delete_if_not_found
(log and delete only when configured) and return Action.ERROR or appropriate
redirect_info instead of deleting the source unconditionally. Update the logic
around new_record/action checks in the same function to only perform
self.delete(force=True, dbcommit=dbcommit, delindex=reindex) and return
Action.REDIRECT when the target creation actually succeeded.

460-545: ⚠️ Potential issue | 🟠 Major

Keep non-404 responses out of the “not found” path.

Line 545 currently collapses any non-200 response that reaches this block into the same {} / NO RECORD sentinel used for a genuine miss. update_online() and handle_redirect() both branch on falsy payloads, so a backend failure can be treated as a real disappearance, and with delete_if_not_found=True the redirect path can even remove a valid VIAF record. Reserve the empty sentinel for 404 / empty-success cases only, and raise RetryableVIAFError for the rest.

As per coding guidelines, focus on “Security vulnerabilities and correctness issues” and “Logic errors and architectural improvements.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 460 - 545, The code currently
treats any non-200 response as a NO RECORD (empty dict) which can cause backend
failures to be handled like genuine misses; modify the response handling in the
VIAF fetch routine (the block using response, pid, url and setting msg) so that:
if response.status_code == requests.codes.ok continue parsing as now; if
response.status_code == requests.codes.not_found (404) return {} and the NO
RECORD msg as before; otherwise raise RetryableVIAFError with a clear message
(include pid/url/status) so callers like update_online() and handle_redirect()
will retry instead of deleting/redirecting a valid record. Ensure the raised
exception uses the existing RetryableVIAFError symbol and do not change
downstream logic that expects an empty dict only for true 404/empty-success
cases.
🧹 Nitpick comments (3)
tests/unit/places/test_places_gnd_transformation.py (1)

133-149: Add a non-DDC case to prevent false DDC classification coverage.

Line 136 includes $2=DDC, but the test only validates the happy path. Given rero_mef/marctojson/do_gnd_places.py currently builds bf:ClassificationDdc from 686$a without checking 686$2, this test won’t catch non-DDC misclassification regressions.

Suggested test extension
+def test_gnd_classification_ignores_non_ddc():
+    """Test trans_gnd_classification ignores non-DDC 686."""
+    xml_part_to_add = """
+        <datafield tag="686" ind1=" " ind2=" ">
+            <subfield code="a">360</subfield>
+            <subfield code="c">Geografie. Anthrogeografie</subfield>
+            <subfield code="2">LCC</subfield>
+        </datafield>
+    """
+    trans = trans_prep(Transformation, "places", xml_part_to_add)
+    trans.trans_gnd_classification()
+    assert "classification" not in trans.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/places/test_places_gnd_transformation.py` around lines 133 - 149,
Update the unit test to include a non-DDC case so
trans.trans_gnd_classification() does not falsely create a bf:ClassificationDdc
when $2 is not "DDC": in test_places_gnd_transformation.py, add another xml_part
(or modify the existing one) where the datafield tag="686" contains subfield
code="a" but subfield code="2" is a value other than "DDC" and assert that
Transformation (created via trans_prep) either does not add a classification
entry or adds the appropriate non-DDC classification; also run the test to
ensure the logic in rero_mef/marctojson/do_gnd_places.py (the code that builds
bf:ClassificationDdc from 686$a) will be caught if it ignores 686$2.
tests/ui/test_cli.py (2)

212-234: Tighten queue inspection assertions to verify passive declarations are used.

The return-value assertion is good, but adding assertions on queue_declare call semantics would better lock in intended broker interaction.

Proposed assertion strengthening
     with mock.patch("rero_mef.cli.current_celery", fake_celery):
         assert rabbitmq_queue_count() == 5
+    assert channel.queue_declare.call_count == len(queues)
+    assert all(
+        c.kwargs.get("passive") is True for c in channel.queue_declare.call_args_list
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/test_cli.py` around lines 212 - 234, The test_rabbitmq_queue_count
currently only asserts the returned total; enhance it to assert that
channel.queue_declare was called for each queue using passive declarations so
the broker is only inspected and not modified: after setting up fake_celery and
calling rabbitmq_queue_count(), assert channel.queue_declare.call_count ==
len(queues) and assert that channel.queue_declare was called with the expected
queue names (from fake_celery.amqp.queues keys) and with passive=True (or the
passive argument used by rabbitmq_queue_count), and verify the call order if
needed; locate the assertions around test_rabbitmq_queue_count,
channel.queue_declare, fake_celery.connection_or_acquire and
rabbitmq_queue_count to implement these checks.

236-246: Prefer exact sleep-sequence assertion over count-only check.

call_count == 3 can pass despite wrong delays/order; asserting exact calls makes this retry/wait contract much more robust.

Proposed assertion strengthening
-        assert mocked_sleep.call_count == 3
+        assert mocked_sleep.call_args_list == [mock.call(5), mock.call(0), mock.call(5)]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/test_cli.py` around lines 236 - 246, Replace the loose sleep-count
assertion in test_wait_empty_tasks_checks_rabbitmq with an exact sequence
assertion so the retry/wait order and arguments are validated: instead of
asserting mocked_sleep.call_count == 3, assert that mocked_sleep was called
three times with the expected delay argument in the correct order (matching the
side effects on queue_count and rabbitmq_queue_count); use
mocked_sleep.call_args_list and compare to the expected list of call entries
(e.g., three calls with delay=0) to lock down behavior of wait_empty_tasks and
its interactions with rabbitmq_queue_count and queue_count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/monitoring/cli.py`:
- Around line 168-170: In both db_connection_counts and db_connections replace
the broad except blocks that call click.secho(...) and return with raising a
ClickException so the process exits non‑zero; specifically, catch Exception as
error and replace the click.secho + return sequence with raise
click.ClickException(str(error)) so database errors print to stderr and set a
non‑zero exit code.

---

Duplicate comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 637-658: After calling create_or_update (in the block that sets
new_record, action), validate that new_record is not None and that action
indicates a successful creation/update (not Action.ERROR) before calling
self.delete or returning Action.REDIRECT; if create_or_update returned no record
or an error, respect delete_if_not_found (log and delete only when configured)
and return Action.ERROR or appropriate redirect_info instead of deleting the
source unconditionally. Update the logic around new_record/action checks in the
same function to only perform self.delete(force=True, dbcommit=dbcommit,
delindex=reindex) and return Action.REDIRECT when the target creation actually
succeeded.
- Around line 460-545: The code currently treats any non-200 response as a NO
RECORD (empty dict) which can cause backend failures to be handled like genuine
misses; modify the response handling in the VIAF fetch routine (the block using
response, pid, url and setting msg) so that: if response.status_code ==
requests.codes.ok continue parsing as now; if response.status_code ==
requests.codes.not_found (404) return {} and the NO RECORD msg as before;
otherwise raise RetryableVIAFError with a clear message (include pid/url/status)
so callers like update_online() and handle_redirect() will retry instead of
deleting/redirecting a valid record. Ensure the raised exception uses the
existing RetryableVIAFError symbol and do not change downstream logic that
expects an empty dict only for true 404/empty-success cases.

---

Nitpick comments:
In `@tests/ui/test_cli.py`:
- Around line 212-234: The test_rabbitmq_queue_count currently only asserts the
returned total; enhance it to assert that channel.queue_declare was called for
each queue using passive declarations so the broker is only inspected and not
modified: after setting up fake_celery and calling rabbitmq_queue_count(),
assert channel.queue_declare.call_count == len(queues) and assert that
channel.queue_declare was called with the expected queue names (from
fake_celery.amqp.queues keys) and with passive=True (or the passive argument
used by rabbitmq_queue_count), and verify the call order if needed; locate the
assertions around test_rabbitmq_queue_count, channel.queue_declare,
fake_celery.connection_or_acquire and rabbitmq_queue_count to implement these
checks.
- Around line 236-246: Replace the loose sleep-count assertion in
test_wait_empty_tasks_checks_rabbitmq with an exact sequence assertion so the
retry/wait order and arguments are validated: instead of asserting
mocked_sleep.call_count == 3, assert that mocked_sleep was called three times
with the expected delay argument in the correct order (matching the side effects
on queue_count and rabbitmq_queue_count); use mocked_sleep.call_args_list and
compare to the expected list of call entries (e.g., three calls with delay=0) to
lock down behavior of wait_empty_tasks and its interactions with
rabbitmq_queue_count and queue_count.

In `@tests/unit/places/test_places_gnd_transformation.py`:
- Around line 133-149: Update the unit test to include a non-DDC case so
trans.trans_gnd_classification() does not falsely create a bf:ClassificationDdc
when $2 is not "DDC": in test_places_gnd_transformation.py, add another xml_part
(or modify the existing one) where the datafield tag="686" contains subfield
code="a" but subfield code="2" is a value other than "DDC" and assert that
Transformation (created via trans_prep) either does not add a classification
entry or adds the appropriate non-DDC classification; also run the test to
ensure the logic in rero_mef/marctojson/do_gnd_places.py (the code that builds
bf:ClassificationDdc from 686$a) will be caught if it ignores 686$2.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 06e302c3-ec57-45c7-88ba-2fdd1862e4c1

📥 Commits

Reviewing files that changed from the base of the PR and between 1266764 and 386c6eb.

⛔ Files ignored due to path filters (25)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • data/aggnd.json is excluded by none and included by none
  • data/aggnd_metadata.csv is excluded by !**/*.csv and included by none
  • data/aggnd_pidstore.csv is excluded by !**/*.csv and included by none
  • data/agrero.json is excluded by none and included by none
  • data/agrero_metadata.csv is excluded by !**/*.csv and included by none
  • data/agrero_pidstore.csv is excluded by !**/*.csv and included by none
  • data/aidref.json is excluded by none and included by none
  • data/aidref_metadata.csv is excluded by !**/*.csv and included by none
  • data/aidref_pidstore.csv is excluded by !**/*.csv and included by none
  • data/cidref.json is excluded by none and included by none
  • data/mef.json is excluded by none and included by none
  • data/mef_id.csv is excluded by !**/*.csv and included by none
  • data/mef_metadata.csv is excluded by !**/*.csv and included by none
  • data/mef_pidstore.csv is excluded by !**/*.csv and included by none
  • data/viaf.json is excluded by none and included by none
  • data/viaf_metadata.csv is excluded by !**/*.csv and included by none
  • data/viaf_pidstore.csv is excluded by !**/*.csv and included by none
  • docker-services.yml is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • scripts/test is excluded by none and included by none
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (35)
  • rero_mef/agents/api.py
  • rero_mef/agents/cli.py
  • rero_mef/agents/mef/api.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/api_mef.py
  • rero_mef/cli.py
  • rero_mef/cli_logging.py
  • rero_mef/marctojson/do_gnd_agent.py
  • rero_mef/marctojson/do_gnd_concepts.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/marctojson/do_idref_agent.py
  • rero_mef/marctojson/do_idref_concepts.py
  • rero_mef/marctojson/do_idref_places.py
  • rero_mef/marctojson/do_rero_agent.py
  • rero_mef/marctojson/do_rero_concepts.py
  • rero_mef/marctojson/do_skeleton.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • rero_mef/places/utils.py
  • rero_mef/utils.py
  • tests/ui/agents/test_agents_api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • tests/ui/test_api.py
  • tests/ui/test_cli.py
  • tests/ui/test_views.py
  • tests/unit/concepts/test_concepts_gnd_transformation.py
  • tests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
  • rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (13)
  • tests/ui/test_views.py
  • tests/ui/places/test_places_api.py
  • rero_mef/marctojson/do_idref_places.py
  • rero_mef/places/utils.py
  • rero_mef/marctojson/do_idref_concepts.py
  • rero_mef/marctojson/do_idref_agent.py
  • rero_mef/marctojson/do_skeleton.py
  • rero_mef/marctojson/do_rero_concepts.py
  • rero_mef/marctojson/do_rero_agent.py
  • rero_mef/marctojson/do_gnd_agent.py
  • rero_mef/api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • tests/ui/agents/test_agents_cli.py
🚧 Files skipped from review as they are similar to previous changes (12)
  • tests/ui/test_api.py
  • rero_mef/cli_logging.py
  • tests/unit/concepts/test_concepts_gnd_transformation.py
  • rero_mef/agents/mef/api.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/places/idref/api.py
  • rero_mef/marctojson/do_gnd_concepts.py
  • rero_mef/monitoring/views.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • rero_mef/cli.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/agents/cli.py

Pull-based VIAF update using the JSON API since
the OAI-PMH endpoint is non-functional. Adds CLI
command harvest-viaf with rolling, missing and full
refresh modes. Detects and handles VIAF cluster
merges (redirects) with automatic MEF cleanup.

Co-Authored-by: Peter Weber <peter.weber@rero.ch>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
rero_mef/marctojson/do_gnd_concepts.py (1)

164-164: Correct the logged function name in trans_gnd_relation_pid.

Line 164 logs "trans_gnd_relation", which is misleading when tracing execution of trans_gnd_relation_pid.

Suggested patch
-            self.logger.info("Call Function: %s", "trans_gnd_relation")
+            self.logger.info("Call Function: %s", "trans_gnd_relation_pid")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/marctojson/do_gnd_concepts.py` at line 164, The logger call inside
trans_gnd_relation_pid currently logs "trans_gnd_relation" which is misleading;
update the logging statement in trans_gnd_relation_pid to call
self.logger.info("Call Function: %s", "trans_gnd_relation_pid") so the function
name matches the actual function (look for the existing self.logger.info line
within trans_gnd_relation_pid and replace the string literal).
rero_mef/agents/viaf/api.py (1)

637-643: Unused action variable (Ruff RUF059).

The action returned by create_or_update is intentionally discarded (since handle_redirect returns Action.REDIRECT on success). Prefix with underscore to indicate intentional discard.

🔧 Suggested fix
-        new_record, action = self.create_or_update(
+        new_record, _action = self.create_or_update(
             data=target_data,
             dbcommit=dbcommit,
             reindex=reindex,
             test_md5=True,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/api.py` around lines 637 - 643, The local variable
"action" from the create_or_update call in handle_redirect is unused and
triggers Ruff RUF059; rename it to "_action" (or prefix with underscore) to mark
it intentionally discarded where create_or_update(...) is called so the returned
tuple is still unpacked but the linter knows the value is ignored (reference
create_or_update and handle_redirect/Action.REDIRECT).
rero_mef/agents/viaf/tasks.py (1)

38-48: The update_agents parameter is declared but never used.

The parameter is documented and passed through to _refresh_viaf_record, but it's never actually used within the function. Since AgentViafRecord.create_or_update already calls create_mef_and_agents internally, this parameter appears to be leftover from previous iterations.

Consider removing the update_agents parameter from _refresh_viaf_record and the Celery tasks if it's not needed, or document why it's reserved for future use.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/viaf/tasks.py` around lines 38 - 48, The update_agents
parameter on _refresh_viaf_record (and the Celery task wrappers) is declared and
passed but unused—remove it to avoid dead API surface: drop update_agents from
the _refresh_viaf_record function signature and from any Celery task
signatures/calls that forward it, update the docstring to remove the parameter,
and remove any corresponding arguments in callers; if you prefer to keep it for
backward compatibility, instead mark it as deprecated/unused in the docstring
and add a noop comment that references AgentViafRecord.create_or_update and its
create_mef_and_agents behavior so future readers know why it’s unused. Ensure
tests and any call sites are updated accordingly.
rero_mef/cli.py (1)

1354-1369: Verify flush placement relative to orphan scan.

The flush_indexes() call at line 1355 occurs inside the for pid_type, entity_map loop, so it flushes after each entity type's reconciliation. Line 1356 (orphan_pids = ...) is outside this inner loop but still inside the for group_name, mef_cls, group_types in plans: loop. This means orphans are scanned per-group after that group's reconciliation, which appears correct.

However, ensure that mef_cls.flush_indexes() at line 1355 is only called when there were actual reconciliations (when entity_map was non-empty). Currently it's guarded by if not dry_run: at line 1354, which would still run even if no reconciliations happened for that pid_type.

🔧 Suggested fix to only flush when needed
-                if not dry_run:
-                    mef_cls.flush_indexes()
+            if not dry_run and reconciled_count > 0:
+                mef_cls.flush_indexes()

Move the flush outside the for pid_type loop and gate it on whether any reconciliations occurred.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/cli.py` around lines 1354 - 1369, The flush of search/indexes
(mef_cls.flush_indexes()) is being called unconditionally for each pid_type when
not dry_run, even if no reconciliations happened; change the logic to track
whether any reconciliations occurred for the current group (e.g. set a boolean
like reconciled=True when any entity_map is non-empty or any record was
processed inside the for pid_type, entity_map loop) and move the call to
mef_cls.flush_indexes() so it runs once per group (after the pid_type loop) only
if reconciled is True and not dry_run; this ensures flush_indexes() is only
invoked when there were actual changes before you call
get_all_pids_without_entities_and_viaf() and scan for orphans.
tests/ui/agents/test_agents_viaf_api.py (1)

519-524: Unused redirect_info variable (Ruff RUF059).

The unpacked variable redirect_info is never used in this test. Prefix it with an underscore to indicate intentional discard.

🔧 Suggested fix
     # Call handle_redirect - should return ERROR for chained redirect
-    new_record, action, redirect_info = agent_viaf_record.handle_redirect(
+    new_record, action, _redirect_info = agent_viaf_record.handle_redirect(
         redirect_to_pid=new_pid,
         dbcommit=True,
         reindex=True,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ui/agents/test_agents_viaf_api.py` around lines 519 - 524, The test
unpacks three values from agent_viaf_record.handle_redirect but never uses the
third (redirect_info), triggering Ruff RUF059; change the unpacked name from
redirect_info to _redirect_info (or simply _) in the assignment where
handle_redirect is called to mark it as intentionally unused (the call to
agent_viaf_record.handle_redirect with redirect_to_pid=new_pid, dbcommit=True,
reindex=True).
rero_mef/monitoring/cli.py (1)

30-33: Consider adding @with_appcontext to the group for complete logging normalization.

monitoring() is a plain group callback without @with_appcontext, so it executes without app context. While ensure_single_stream_handler() is designed to handle this gracefully (it always normalizes the named logger and catches RuntimeError when accessing current_app), the app logger normalization and propagate=False setting in lines 34-37 of rero_mef/cli_logging.py will be skipped. Adding @with_appcontext to the group would ensure complete normalization at the group level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/monitoring/cli.py` around lines 30 - 33, The monitoring click group
is defined without Flask's application context, so calling
ensure_single_stream_handler() inside monitoring() may skip the app logger
normalization (including propagate=False) implemented in cli_logging.py; add the
`@with_appcontext` decorator to the monitoring() group so it runs with an app
context and ensure_single_stream_handler() can perform full normalization of the
app logger (refer to the monitoring() function and
ensure_single_stream_handler() call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@rero_mef/agents/cli.py`:
- Around line 317-336: The CLI defines the short flag "-p" twice (for the
progress option and the viaf_pid option), so change one of the short flags to a
unique letter to avoid collision; for example, update the click.option for
"viaf_pid" (symbol viaf_pid) to use a different short flag such as "-i" or
remove the short flag entirely, or alternatively change the progress option's
short flag, ensuring the click.option declarations for "progress" and "viaf_pid"
no longer share "-p".

---

Nitpick comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 637-643: The local variable "action" from the create_or_update
call in handle_redirect is unused and triggers Ruff RUF059; rename it to
"_action" (or prefix with underscore) to mark it intentionally discarded where
create_or_update(...) is called so the returned tuple is still unpacked but the
linter knows the value is ignored (reference create_or_update and
handle_redirect/Action.REDIRECT).

In `@rero_mef/agents/viaf/tasks.py`:
- Around line 38-48: The update_agents parameter on _refresh_viaf_record (and
the Celery task wrappers) is declared and passed but unused—remove it to avoid
dead API surface: drop update_agents from the _refresh_viaf_record function
signature and from any Celery task signatures/calls that forward it, update the
docstring to remove the parameter, and remove any corresponding arguments in
callers; if you prefer to keep it for backward compatibility, instead mark it as
deprecated/unused in the docstring and add a noop comment that references
AgentViafRecord.create_or_update and its create_mef_and_agents behavior so
future readers know why it’s unused. Ensure tests and any call sites are updated
accordingly.

In `@rero_mef/cli.py`:
- Around line 1354-1369: The flush of search/indexes (mef_cls.flush_indexes())
is being called unconditionally for each pid_type when not dry_run, even if no
reconciliations happened; change the logic to track whether any reconciliations
occurred for the current group (e.g. set a boolean like reconciled=True when any
entity_map is non-empty or any record was processed inside the for pid_type,
entity_map loop) and move the call to mef_cls.flush_indexes() so it runs once
per group (after the pid_type loop) only if reconciled is True and not dry_run;
this ensures flush_indexes() is only invoked when there were actual changes
before you call get_all_pids_without_entities_and_viaf() and scan for orphans.

In `@rero_mef/marctojson/do_gnd_concepts.py`:
- Line 164: The logger call inside trans_gnd_relation_pid currently logs
"trans_gnd_relation" which is misleading; update the logging statement in
trans_gnd_relation_pid to call self.logger.info("Call Function: %s",
"trans_gnd_relation_pid") so the function name matches the actual function (look
for the existing self.logger.info line within trans_gnd_relation_pid and replace
the string literal).

In `@rero_mef/monitoring/cli.py`:
- Around line 30-33: The monitoring click group is defined without Flask's
application context, so calling ensure_single_stream_handler() inside
monitoring() may skip the app logger normalization (including propagate=False)
implemented in cli_logging.py; add the `@with_appcontext` decorator to the
monitoring() group so it runs with an app context and
ensure_single_stream_handler() can perform full normalization of the app logger
(refer to the monitoring() function and ensure_single_stream_handler() call).

In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 519-524: The test unpacks three values from
agent_viaf_record.handle_redirect but never uses the third (redirect_info),
triggering Ruff RUF059; change the unpacked name from redirect_info to
_redirect_info (or simply _) in the assignment where handle_redirect is called
to mark it as intentionally unused (the call to
agent_viaf_record.handle_redirect with redirect_to_pid=new_pid, dbcommit=True,
reindex=True).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 72a17809-c3c8-41c6-97bb-755ebc5d4d77

📥 Commits

Reviewing files that changed from the base of the PR and between 386c6eb and afd7155.

⛔ Files ignored due to path filters (25)
  • .github/AI_SETUP.md is excluded by none and included by none
  • .github/copilot-instructions.md is excluded by none and included by none
  • .github/workflows/continuous-integration-test.yml is excluded by none and included by none
  • CLAUDE.md is excluded by none and included by none
  • data/aggnd.json is excluded by none and included by none
  • data/aggnd_metadata.csv is excluded by !**/*.csv and included by none
  • data/aggnd_pidstore.csv is excluded by !**/*.csv and included by none
  • data/agrero.json is excluded by none and included by none
  • data/agrero_metadata.csv is excluded by !**/*.csv and included by none
  • data/agrero_pidstore.csv is excluded by !**/*.csv and included by none
  • data/aidref.json is excluded by none and included by none
  • data/aidref_metadata.csv is excluded by !**/*.csv and included by none
  • data/aidref_pidstore.csv is excluded by !**/*.csv and included by none
  • data/cidref.json is excluded by none and included by none
  • data/mef.json is excluded by none and included by none
  • data/mef_id.csv is excluded by !**/*.csv and included by none
  • data/mef_metadata.csv is excluded by !**/*.csv and included by none
  • data/mef_pidstore.csv is excluded by !**/*.csv and included by none
  • data/viaf.json is excluded by none and included by none
  • data/viaf_metadata.csv is excluded by !**/*.csv and included by none
  • data/viaf_pidstore.csv is excluded by !**/*.csv and included by none
  • docker-services.yml is excluded by none and included by none
  • rero_mef/config.py is excluded by !rero_mef/config.py and included by rero_mef/**/*.py
  • scripts/test is excluded by none and included by none
  • uv.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (36)
  • rero_mef/agents/api.py
  • rero_mef/agents/cli.py
  • rero_mef/agents/mef/api.py
  • rero_mef/agents/viaf/api.py
  • rero_mef/agents/viaf/tasks.py
  • rero_mef/api.py
  • rero_mef/api_mef.py
  • rero_mef/cli.py
  • rero_mef/cli_logging.py
  • rero_mef/marctojson/do_gnd_agent.py
  • rero_mef/marctojson/do_gnd_concepts.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/marctojson/do_idref_agent.py
  • rero_mef/marctojson/do_idref_concepts.py
  • rero_mef/marctojson/do_idref_places.py
  • rero_mef/marctojson/do_rero_agent.py
  • rero_mef/marctojson/do_rero_concepts.py
  • rero_mef/marctojson/do_skeleton.py
  • rero_mef/monitoring/api.py
  • rero_mef/monitoring/cli.py
  • rero_mef/monitoring/views.py
  • rero_mef/places/idref/api.py
  • rero_mef/places/utils.py
  • rero_mef/utils.py
  • tests/ui/agents/test_agents_api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_api.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/places/test_places_api.py
  • tests/ui/places/test_places_gnd_tasks.py
  • tests/ui/test_api.py
  • tests/ui/test_cli.py
  • tests/ui/test_monitoring.py
  • tests/ui/test_views.py
  • tests/unit/concepts/test_concepts_gnd_transformation.py
  • tests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
  • rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (10)
  • rero_mef/api.py
  • tests/ui/test_views.py
  • rero_mef/marctojson/do_skeleton.py
  • rero_mef/places/utils.py
  • rero_mef/marctojson/do_idref_places.py
  • rero_mef/marctojson/do_idref_concepts.py
  • rero_mef/marctojson/do_rero_concepts.py
  • rero_mef/marctojson/do_gnd_agent.py
  • tests/ui/places/test_places_gnd_tasks.py
  • rero_mef/marctojson/do_rero_agent.py
🚧 Files skipped from review as they are similar to previous changes (14)
  • tests/ui/test_api.py
  • rero_mef/marctojson/do_idref_agent.py
  • tests/unit/concepts/test_concepts_gnd_transformation.py
  • tests/unit/places/test_places_gnd_transformation.py
  • rero_mef/cli_logging.py
  • rero_mef/agents/mef/api.py
  • rero_mef/marctojson/do_gnd_places.py
  • rero_mef/monitoring/views.py
  • tests/ui/agents/test_agents_api.py
  • rero_mef/places/idref/api.py
  • rero_mef/agents/api.py
  • tests/ui/agents/test_agents_cli.py
  • tests/ui/agents/test_agents_viaf_tasks.py
  • tests/ui/test_cli.py

Comment on lines +317 to +336
@click.option("-p", "--progress", "progress", is_flag=True, default=False)
@click.option(
"-r",
"--rolling",
"rolling",
is_flag=True,
default=False,
help="Rolling refresh (oldest-first batch for ~6-month cycle).",
)
@click.option(
"-d",
"--delete-if-not-found",
"delete_if_not_found",
is_flag=True,
default=False,
help="Delete old VIAF record if redirect target not found or fails.",
)
@click.option(
"-p", "--pid", "viaf_pid", default=None, help="Harvest a single VIAF PID."
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Short flag -p is defined twice: --progress and --pid.

Both options use -p as their short flag (line 317 for --progress and line 335 for --pid). Click will only recognize the last definition, making --progress inaccessible via its short form.

🔧 Suggested fix
 `@click.option`("-v", "--verbose", "verbose", is_flag=True, default=False)
-@click.option("-p", "--progress", "progress", is_flag=True, default=False)
+@click.option("-P", "--progress", "progress", is_flag=True, default=False)
 `@click.option`(
     "-r",
     "--rolling",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rero_mef/agents/cli.py` around lines 317 - 336, The CLI defines the short
flag "-p" twice (for the progress option and the viaf_pid option), so change one
of the short flags to a unique letter to avoid collision; for example, update
the click.option for "viaf_pid" (symbol viaf_pid) to use a different short flag
such as "-i" or remove the short flag entirely, or alternatively change the
progress option's short flag, ensuring the click.option declarations for
"progress" and "viaf_pid" no longer share "-p".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants