feat(VIAF): add cluster refresh pipeline#203
Conversation
e9b031c to
3d53455
Compare
c89accf to
af4511b
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (12)
tests/ui/places/test_places_api.py (1)
261-261:appfixture is not needed intest_places_utils_make_identifier.
make_identifieroperates only on its argument and never touchescurrent_app, so theappfixture serves no purpose here and Ruff's ARG001 warning is valid.♻️ Proposed fix
-def test_places_utils_make_identifier(app): +def test_places_utils_make_identifier():🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/places/test_places_api.py` at line 261, The test function test_places_utils_make_identifier includes an unused app fixture; remove the unused parameter from the test signature so the test only accepts the input it needs (i.e., drop the app argument) because make_identifier is pure and does not use current_app; update any calls or references to test_places_utils_make_identifier accordingly and run tests to confirm Ruff ARG001 warning is resolved..github/copilot-instructions.md (2)
97-102:mock_responsehelper is undefined in the example — add an import or show its construction.The snippet references
mock_response(json_data=...)but neither imports it nor shows how it's constructed. AI assistants following this example literally will produce broken test code. Show the helper's origin (e.g., aconftest.pyfixture) or replace it with a standardMagicMock/unittest.mock.Mockinline setup.✏️ Suggested clarification
`@mock.patch`("requests.Session.get") def test_api_call_success(mock_get, app): """Test API call with successful response.""" - mock_get.return_value = mock_response(json_data={"result": "ok"}) - # Test implementation + mock_response = mock.MagicMock() + mock_response.json.return_value = {"result": "ok"} + mock_response.status_code = 200 + mock_get.return_value = mock_response + # Test implementation🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/copilot-instructions.md around lines 97 - 102, The example uses an undefined helper mock_response in the test_api_call_success function (and the mock.patch on requests.Session.get); fix it by either importing/defining mock_response (e.g., add a conftest.py fixture or helper function named mock_response that returns a Mock with .json() and .status_code) or replace the call inline with unittest.mock.MagicMock/Mock configured with the expected .json return value and status_code; update the test_api_call_success reference to use that defined helper or inline mock so the example is runnable.
170-180:resultis referenced but never assigned in the task pattern snippet.The placeholder
return resultwill cause aNameErrorif copy-pasted verbatim. Replace with a comment clarifying the return shape.✏️ Suggested clarification
- # Task implementation - return result + # Task implementation + return {"count": 0} # Return a summary dict or similar🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/copilot-instructions.md around lines 170 - 180, The snippet defines my_task(param, dbcommit=True, reindex=True, verbose=False) but returns an unassigned variable result; replace the literal return with a descriptive comment describing the expected return shape (e.g., dict with keys 'status' and 'data', or specific type) so copy-pasting won’t raise NameError—update the docstring or add an inline comment near the end of my_task to specify the return value contract instead of return result.CLAUDE.md (1)
28-28: Minor sync drift with.github/copilot-instructions.md— the equivalent line there uses bold (**Never use…**) while this line doesn't. PerAI_SETUP.md, both files should stay synchronized.✏️ Suggested fix
-Never use `pip`, `python -m pytest`, or bare `pytest` — always go through `uv`. +**Never use `pip`, `python -m pytest`, or bare `pytest`** — always go through `uv`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` at line 28, Update CLAUDE.md so the sentence "Never use `pip`, `python -m pytest`, or bare `pytest` — always go through `uv`." matches the styling in .github/copilot-instructions.md by making it bold (wrap the entire sentence in ** **); locate the exact text in CLAUDE.md and replace it with the bolded version to keep both files synchronized per AI_SETUP.md.rero_mef/agents/viaf/api.py (1)
391-433:handle_redirectlooks solid; minor: prefix unused unpacked variables with_.The redirect handling logic is well-structured — chained-redirect protection (line 415) and the delete-then-create/update flow are correct. Two small observations:
- Line 427:
actionfromcreate_or_updateis never used since line 433 always returnsAction.REDIRECT. Prefix with_to signal intent.- Line 426:
target_data.pop("redirect_from", None)is a no-op here (ifredirect_fromwere present, we'd have returnedAction.ERRORon line 420). Harmless defensive code, but a comment would clarify.Suggested fix
- new_record, action = self.create_or_update( + new_record, _action = self.create_or_update(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 391 - 433, In handle_redirect, mark the unused unpacked variable from create_or_update by renaming action to _action (or prefix it with an underscore) to indicate it’s intentionally unused, and add a brief inline comment next to the target_data.pop("redirect_from", None) call in handle_redirect explaining it’s defensive (kept to ensure no stray redirect_from key is persisted even though earlier code returns on chained redirects).tests/ui/agents/test_agents_viaf_api.py (2)
437-469: Chained redirect test is correct; minor nit on unused variable.The test correctly verifies that a chained redirect (target itself redirects again) returns
Action.ERROR. Line 461:redirect_infois unused — prefix with_per convention.Suggested fix
- new_record, action, redirect_info = agent_viaf_record.handle_redirect( + new_record, action, _redirect_info = agent_viaf_record.handle_redirect(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` around lines 437 - 469, In test_handle_redirect_chained, the local variable redirect_info returned from AgentViafRecord.handle_redirect is unused; update the call to capture it into a discard variable (e.g., _redirect_info) so the test follows naming conventions: change the tuple assignment from "new_record, action, redirect_info = agent_viaf_record.handle_redirect(...)" to use "_redirect_info" instead; leave all assertions and other variable names unchanged.
452-458: The"redirect_from": new_pidin mock data is a red herring.
get_online_recordbuilds the result dict from scratch — it never copiesredirect_fromfrom the raw API JSON. The chained redirect is detected becauseviafID("33333333") ≠ requested pid ("22222222"), causingget_online_recordto set its ownredirect_from. The extra field in the mock is harmless but potentially misleading for future readers.Cleaner mock data
chained_data = { "viafID": "33333333", - "redirect_from": new_pid, "sources": {}, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` around lines 452 - 458, The mock response is misleading because get_online_record detects redirects by comparing viafID to the requested pid rather than reading redirect_from; update the test by removing "redirect_from": new_pid from the chained_data mock so chained_data contains only the differing "viafID" ("33333333") and any minimal required fields (e.g., "sources"), leaving mock_get.return_value = mock_response(json_data=chained_data) and keeping the comparison behavior in get_online_record unchanged.rero_mef/agents/cli.py (2)
277-292: Move the import of_refresh_viaf_recordoutside the loop, and reconsider importing a private function.The
from .viaf.tasks import _refresh_viaf_recordon line 285 is executed on every iteration when not enqueueing. Python caches module imports so the performance hit is negligible, but it's unconventional and hurts readability. Additionally,_refresh_viaf_recordis a private function (leading underscore) — consider either making it public or using the already-importedviaf_get_recordwrapper instead.Suggested fix — use the public `viaf_get_record` directly
for pid in progress_bar: if batch_size and count >= batch_size: break if enqueue: task = viaf_get_record.delay(viaf_pid=pid, verbose=verbose) if verbose: click.echo(f" viaf pid: {pid} task: {task}") else: - from .viaf.tasks import _refresh_viaf_record - - action = _refresh_viaf_record( - pid=pid, dbcommit=True, reindex=True, verbose=verbose - ) - action_counts.setdefault(action.value, 0) - action_counts[action.value] += 1 + action_value = viaf_get_record( + viaf_pid=pid, dbcommit=True, reindex=True, verbose=verbose + ) + action_counts.setdefault(action_value, 0) + action_counts[action_value] += 1 count += 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 277 - 292, The loop currently imports the private function _refresh_viaf_record on every iteration and uses a leading-underscore symbol; move the import out of the loop (e.g., import _refresh_viaf_record once before iterating over progress_bar) and replace the private call with the public wrapper viaf_get_record if possible (or promote _refresh_viaf_record to a public name) so readability and convention are preserved; update usages of action/action_counts and enqueue logic to call the chosen public function consistently.
221-232:--rollingand--missingare silently mutually exclusive.If a user passes both
--rollingand--missing, only rolling runs (due toif/elif). Consider using aclick.Choicefor mode selection, or add validation to warn/error when both are specified.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 221 - 232, The two boolean click options "rolling" and "missing" are currently silently mutually exclusive (the handler uses if/elif), so update the CLI to enforce a single mode: either replace the two flags with a single choice option (e.g., "--mode" with click.Choice values like "rolling" and "missing") or add explicit validation at the start of the command handler that checks the boolean flags rolling and missing and raises click.UsageError (or prints/warns) if both are True; ensure the error message clearly describes the conflict and tells the user to choose one mode.rero_mef/agents/viaf/tasks.py (3)
51-51: Prefix unused unpacked variables with_.
new_recordandredirect_infofromhandle_redirectare unused here.Suggested fix
- new_record, action, redirect_info = existing_record.handle_redirect( + _new_record, action, _redirect_info = existing_record.handle_redirect(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` at line 51, The tuple returned by existing_record.handle_redirect is unpacked into new_record, action, redirect_info but new_record and redirect_info are unused; update the unpacking to prefix unused variables with an underscore (e.g., _new_record, action, _redirect_info) so only the used symbol action remains non-underscored, keeping existing_record.handle_redirect and the assignment structure unchanged.
36-74: No error handling for network failures in_refresh_viaf_record.If
get_online_recordraises an unhandled exception (e.g.,ConnectionError,Timeoutafter retries are exhausted), the entire task crashes. For a batch pipeline processing potentially thousands of records, consider catching and logging transient errors to avoid aborting the entire run.Sketch
+ try: online_data, msg = AgentViafRecord.get_online_record( viaf_source_code="VIAF", pid=pid ) + except Exception as err: + current_app.logger.error(f"VIAF refresh error for {pid}: {err}") + return Action.ERROR🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 36 - 74, Wrap the call to AgentViafRecord.get_online_record in a try/except inside _refresh_viaf_record to catch transient network/errors (e.g., requests.exceptions.RequestException or a broad Exception), log the exception via click.echo when verbose (including the pid and exception text) and return a non-fatal Action (e.g., Action.SKIP) instead of letting the exception bubble and crash the task; keep existing behavior for not-found and redirect flows and only alter the control flow when get_online_record raises.
64-71: Redundantadd_md5call on the UPDATE path — consider consolidating MD5 computation logic.Line 65 pre-computes
add_md5(online_data), but for the UPDATE path (whentest_md5=True),update_md5_changed()callsadd_md5()again at line 296. However, for the CREATE path, this pre-computation is necessary sincecreate_or_update()callscls.create()without passingmd5=True, andEntityRecord.create()defaults tomd5=False.Either:
- Remove the pre-computation and modify
create_or_update()to passmd5=Truewhen callingcreate(), or- Keep the pre-computation but add a comment clarifying it ensures MD5 is present for new records (CREATE path), acknowledging the redundancy on UPDATE.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 64 - 71, Pre-computation of MD5 via add_md5(online_data) is redundant for the UPDATE path because update_md5_changed() calls add_md5() again; change the logic so MD5 is computed only where needed by removing the add_md5(online_data) call here and instead update AgentViafRecord.create_or_update to pass md5=True when it invokes create() (EntityRecord.create), ensuring new records still get an MD5 while UPDATE paths keep update_md5_changed() and its add_md5() call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/continuous-integration-test.yml:
- Line 77: The workflow sets carryforward: true which is invalid because
carryforward expects comma-separated flag names and only applies with parallel:
true and a parallel-finished job; remove the carryforward entry from the job
configuration (or alternatively implement a proper parallel matrix with per-leg
flag-name values and a parallel-finished job with parallel-finished: true) —
locate the carryforward key in the Coveralls/upload or coverage job and delete
it unless you intentionally convert the workflow to use parallel +
parallel-finished and explicit flag names.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 47-62: The initial use of the walrus operator removes
redirect_from from online_data early, making the later
online_data.pop("redirect_from", None) dead; change the first occurrence to
retrieve (not remove) the value — e.g., use online_data.get("redirect_from") in
the conditional that assigns redirect_from so AgentViafRecord.get_record_by_pid
and existing_record.handle_redirect can still access/modify online_data, and
preserve the final online_data.pop("redirect_from", None) to remove the key only
when treating it as a new record.
- Around line 117-163: Replace the unclosed scroll by using a PIT context: in
process_viaf_rolling_refresh wrap the iteration over AgentViafSearch() in a with
query.point_in_time(): block (keep the existing .params(preserve_order=True) and
.source("pid")), then iterate the search (using .scan() or the iterator
provided) inside that context and perform the _refresh_viaf_record calls as
before; this ensures the point-in-time is cleaned up even when you break early
(reference: process_viaf_rolling_refresh, AgentViafSearch, scan(),
point_in_time(), _refresh_viaf_record).
In `@rero_mef/config.py`:
- Around line 222-223: Remove the dead configuration constant
RERO_MEF_VIAF_REFRESH_DAYS_SPAN by deleting its definition (the
RERO_MEF_VIAF_REFRESH_DAYS_SPAN = 180 line) from the config module and ensure
nothing else depends on it (keep RERO_MEF_VIAF_VIAF_REFRESH_BATCH_SIZE intact);
run tests/grep to confirm there are no remaining references and delete any
related unused imports or documentation comments if found.
In `@tests/ui/agents/test_agents_cli.py`:
- Around line 131-133: The combined assertion masks whether the task was
actually enqueued; replace the single assertion `assert mock_delay.call_count >=
1 or "task" in res.output.lower()` with two assertions: one that directly checks
the mock was invoked (`assert mock_delay.call_count >= 1`) and a separate one
that validates the CLI output (`assert "task" in res.output.lower()` or a more
specific substring), updating the test in tests/ui/agents/test_agents_cli.py
around where `mock_delay` and `res.output` are asserted (refer to
`mock_delay.call_count` and `res.output`) so each behavior is verified
independently.
- Around line 162-164: The current assertion uses a weak "or" with res.output;
replace it with a strict check that the task was called exactly twice by
asserting mock_task.call_count == 2 (i.e., change the line "assert
mock_task.call_count == 2 or 'VIAF file' in res.output" to "assert
mock_task.call_count == 2") so the test verifies the file with two records
triggers two task invocations (referencing mock_task.call_count and res.output
in the failing assertion).
- Around line 106-109: The assertion using mock_task.call_count >= 0 is
meaningless; change it to assert that the mock was actually called (e.g., assert
mock_task.call_count >= 1 or assert mock_task.called) so the test verifies tasks
were invoked for missing VIAF PIDs; update the assertion in
tests/ui/agents/test_agents_cli.py where mock_task is checked (refer to the
mock_task variable and the surrounding assertions checking res.output) to
require a positive call count or use
mock_task.assert_called()/assert_called_once() as appropriate.
In `@tests/ui/places/test_places_api.py`:
- Around line 275-277: The test's expected value uses a colon but
make_identifier (in rero_mef/places/utils.py) returns a pipe when no source is
present; update the assertion in the test (the identified_by_without_source /
identifier check in test_places_api.py) to expect "bf:Isbn|978-3-16-148410-0"
(i.e., use '|' as the separator) so the test matches the make_identifier
behavior, or alternatively change make_identifier to use ':' consistently if you
want the colon—adjust only one side to make them consistent.
- Line 258: The assertion is tautological because "len(endpoints) >= 0" is
always true; update the test in tests/ui/places/test_places_api.py to perform a
meaningful check on the endpoints variable: either assert "plgnd" in endpoints
or allow an empty mapping by replacing the second operand with "len(endpoints)
== 0" (if empty is acceptable), or if the intent is only to verify type, replace
the whole assertion with "assert isinstance(endpoints, dict)"; locate the
failing line that currently reads "assert 'plgnd' in endpoints or len(endpoints)
>= 0" and change it to one of these correct alternatives depending on desired
behavior.
---
Duplicate comments:
In `@CLAUDE.md`:
- Around line 97-102: Update test_api_call_success to use the same mock_response
helper implementation and explicit return-value assertions as used in the other
file: ensure the mock_response helper referenced by mock_response(...) returns
an object with a .json() method and status_code (so mock_get.return_value =
mock_response(json_data={"result":"ok"}) behaves the same), import or define the
mock_response helper in the test module if missing, and replace any vague
"return result" checks with explicit assertions on the API call's return value
inside test_api_call_success to keep behavior synchronized with the other tests
and the shared helper.
---
Nitpick comments:
In @.github/copilot-instructions.md:
- Around line 97-102: The example uses an undefined helper mock_response in the
test_api_call_success function (and the mock.patch on requests.Session.get); fix
it by either importing/defining mock_response (e.g., add a conftest.py fixture
or helper function named mock_response that returns a Mock with .json() and
.status_code) or replace the call inline with unittest.mock.MagicMock/Mock
configured with the expected .json return value and status_code; update the
test_api_call_success reference to use that defined helper or inline mock so the
example is runnable.
- Around line 170-180: The snippet defines my_task(param, dbcommit=True,
reindex=True, verbose=False) but returns an unassigned variable result; replace
the literal return with a descriptive comment describing the expected return
shape (e.g., dict with keys 'status' and 'data', or specific type) so
copy-pasting won’t raise NameError—update the docstring or add an inline comment
near the end of my_task to specify the return value contract instead of return
result.
In `@CLAUDE.md`:
- Line 28: Update CLAUDE.md so the sentence "Never use `pip`, `python -m
pytest`, or bare `pytest` — always go through `uv`." matches the styling in
.github/copilot-instructions.md by making it bold (wrap the entire sentence in
** **); locate the exact text in CLAUDE.md and replace it with the bolded
version to keep both files synchronized per AI_SETUP.md.
In `@rero_mef/agents/cli.py`:
- Around line 277-292: The loop currently imports the private function
_refresh_viaf_record on every iteration and uses a leading-underscore symbol;
move the import out of the loop (e.g., import _refresh_viaf_record once before
iterating over progress_bar) and replace the private call with the public
wrapper viaf_get_record if possible (or promote _refresh_viaf_record to a public
name) so readability and convention are preserved; update usages of
action/action_counts and enqueue logic to call the chosen public function
consistently.
- Around line 221-232: The two boolean click options "rolling" and "missing" are
currently silently mutually exclusive (the handler uses if/elif), so update the
CLI to enforce a single mode: either replace the two flags with a single choice
option (e.g., "--mode" with click.Choice values like "rolling" and "missing") or
add explicit validation at the start of the command handler that checks the
boolean flags rolling and missing and raises click.UsageError (or prints/warns)
if both are True; ensure the error message clearly describes the conflict and
tells the user to choose one mode.
In `@rero_mef/agents/viaf/api.py`:
- Around line 391-433: In handle_redirect, mark the unused unpacked variable
from create_or_update by renaming action to _action (or prefix it with an
underscore) to indicate it’s intentionally unused, and add a brief inline
comment next to the target_data.pop("redirect_from", None) call in
handle_redirect explaining it’s defensive (kept to ensure no stray redirect_from
key is persisted even though earlier code returns on chained redirects).
In `@rero_mef/agents/viaf/tasks.py`:
- Line 51: The tuple returned by existing_record.handle_redirect is unpacked
into new_record, action, redirect_info but new_record and redirect_info are
unused; update the unpacking to prefix unused variables with an underscore
(e.g., _new_record, action, _redirect_info) so only the used symbol action
remains non-underscored, keeping existing_record.handle_redirect and the
assignment structure unchanged.
- Around line 36-74: Wrap the call to AgentViafRecord.get_online_record in a
try/except inside _refresh_viaf_record to catch transient network/errors (e.g.,
requests.exceptions.RequestException or a broad Exception), log the exception
via click.echo when verbose (including the pid and exception text) and return a
non-fatal Action (e.g., Action.SKIP) instead of letting the exception bubble and
crash the task; keep existing behavior for not-found and redirect flows and only
alter the control flow when get_online_record raises.
- Around line 64-71: Pre-computation of MD5 via add_md5(online_data) is
redundant for the UPDATE path because update_md5_changed() calls add_md5()
again; change the logic so MD5 is computed only where needed by removing the
add_md5(online_data) call here and instead update
AgentViafRecord.create_or_update to pass md5=True when it invokes create()
(EntityRecord.create), ensuring new records still get an MD5 while UPDATE paths
keep update_md5_changed() and its add_md5() call.
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 437-469: In test_handle_redirect_chained, the local variable
redirect_info returned from AgentViafRecord.handle_redirect is unused; update
the call to capture it into a discard variable (e.g., _redirect_info) so the
test follows naming conventions: change the tuple assignment from "new_record,
action, redirect_info = agent_viaf_record.handle_redirect(...)" to use
"_redirect_info" instead; leave all assertions and other variable names
unchanged.
- Around line 452-458: The mock response is misleading because get_online_record
detects redirects by comparing viafID to the requested pid rather than reading
redirect_from; update the test by removing "redirect_from": new_pid from the
chained_data mock so chained_data contains only the differing "viafID"
("33333333") and any minimal required fields (e.g., "sources"), leaving
mock_get.return_value = mock_response(json_data=chained_data) and keeping the
comparison behavior in get_online_record unchanged.
In `@tests/ui/places/test_places_api.py`:
- Line 261: The test function test_places_utils_make_identifier includes an
unused app fixture; remove the unused parameter from the test signature so the
test only accepts the input it needs (i.e., drop the app argument) because
make_identifier is pure and does not use current_app; update any calls or
references to test_places_utils_make_identifier accordingly and run tests to
confirm Ruff ARG001 warning is resolved.
af4511b to
d4cdb54
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
rero_mef/agents/viaf/tasks.py (1)
142-163: Prefer PIT/iterate to avoid lingering scrolls when breaking early.
scan()uses the Scroll API; breaking at Line 154 may leave scroll contexts until TTL. Considerpoint_in_time()+iterate()to ensure cleanup.elasticsearch-dsl Search.point_in_time context manager iterate vs scan preserve_order♻️ Possible refactor
- hits = query.source("pid").scan() + with query.point_in_time(keep_alive="1m") as pit: + hits = pit.source("pid").iterate(keep_alive="1m")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 142 - 163, The current use of AgentViafSearch().params(...).sort(...).scan() can leave scroll contexts open when you break early; replace scan() with the point_in_time() context manager and use its iterate() method (keeping .source("pid") and the same params/sort) so the PIT is automatically closed on exit; iterate() yields the same hits for your progressbar loop and you can keep calling _refresh_viaf_record(pid=hit.pid, ...) and breaking when count >= batch_size without leaving a lingering scroll.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/cli.py`:
- Around line 202-313: The harvest_viaf command currently ignores --progress
unless --verbose is set and treats batch_size=0 inconsistently (rolling uses
batch_size or None but full refresh passes raw 0), so normalize batch_size at
the start (e.g., batch_arg = None if batch_size == 0 else batch_size) and pass
that to process_viaf_rolling_refresh and process_viaf_refresh (both delay and
sync calls) to make "0=all" consistent; also honor the progress flag by passing
a progress parameter into the synchronous refresh calls
(process_viaf_rolling_refresh and process_viaf_refresh) or by enabling
progressbar output even when verbose is False so --progress alone shows progress
feedback. Ensure you update calls in harvest_viaf and use the normalized
batch_arg and progress flag everywhere.
In `@rero_mef/agents/viaf/api.py`:
- Around line 426-433: The local variable returned from create_or_update is
assigned to action but never used; update the tuple assignment in the block that
calls create_or_update (around target_data.pop("redirect_from", None) and the
subsequent call) to rename action to a throwaway variable (e.g., _action or _)
so the linter stops flagging the unused symbol, leaving new_record and the
returned Action.REDIRECT and redirect_info unchanged.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 47-71: The tuple unpackings return unused values flagged by Ruff
RUF059; when calling existing_record.handle_redirect(...) only use the needed
variable(s) (keep action) and discard the others by renaming them to _new_record
and _redirect_info (or prefix with an underscore), and when calling
AgentViafRecord.create_or_update(...) if only action is used rename the unused
record to _record (or _), so replace new_record, redirect_info and record with
underscore-prefixed names to indicate they are intentionally unused while
keeping action as-is.
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 48-80: In test_refresh_viaf_record_update remove the dead
"nonexistent" setup: delete the lines that set pid = "nonexistent" and the
subsequent mock_get.return_value = mock_response(json_data={"sources": {}})
since no call to _refresh_viaf_record or assertions follow; this cleans up
unused test scaffolding around agent_viaf_record and mock_get in
test_refresh_viaf_record_update.
In `@tests/ui/places/test_places_api.py`:
- Line 250: The test function test_places_utils_get_places_endpoints (and the
other test at line ~261 that also only needs the fixture side-effects) should
stop declaring the unused parameter `app` to silence Ruff ARG001; either remove
the `app` parameter and add `@pytest.mark.usefixtures`("app") above the function,
or keep the parameter but add a local per-line suppression (e.g., `# noqa:
ARG001`) immediately after the function signature—apply the same change to the
second test that currently only uses `app` for side-effects so Ruff no longer
reports ARG001.
In `@tests/ui/places/test_places_gnd_tasks.py`:
- Around line 24-25: The test functions (e.g., test_gnd_get_record_success)
declare an unused fixture parameter named app which triggers Ruff ARG001; rename
the parameter to _app (or _app: fixture-type) in each affected test
(test_gnd_get_record_success and the other tests on lines referenced) or append
a per-parameter noqa comment (# noqa: ARG001) to silence the linter; update the
function signatures for test_gnd_get_record_success, and the similar tests at
the other locations to use the underscore-prefixed name consistently so the
fixtures still run but Ruff no longer flags them.
- Around line 113-115: Replace the manual try/except/assert False pattern in the
test that calls gnd_get_record("040754766", debug=True) with pytest.raises to
reliably assert the ValueError; specifically, update the test in
tests/ui/places/test_places_gnd_tasks.py so the call to gnd_get_record is
wrapped in a pytest.raises(ValueError) context manager (referencing
gnd_get_record and the test that currently uses assert False) to avoid
assertions being stripped with python -O.
---
Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 142-163: The current use of
AgentViafSearch().params(...).sort(...).scan() can leave scroll contexts open
when you break early; replace scan() with the point_in_time() context manager
and use its iterate() method (keeping .source("pid") and the same params/sort)
so the PIT is automatically closed on exit; iterate() yields the same hits for
your progressbar loop and you can keep calling _refresh_viaf_record(pid=hit.pid,
...) and breaking when count >= batch_size without leaving a lingering scroll.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
.github/AI_SETUP.md.github/copilot-instructions.md.github/workflows/continuous-integration-test.ymlCLAUDE.mdrero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/config.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
- rero_mef/config.py
d4cdb54 to
8f396e4
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (9)
tests/ui/places/test_places_gnd_tasks.py (2)
23-24: Multiple unusedappfixtures trigger Ruff ARG001.All six test functions declare
appsolely for setup side-effects. Rename to_appor use@pytest.mark.usefixtures("app").Also applies to: 47-49, 60-62, 74-76, 85-87, 108-110
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/places/test_places_gnd_tasks.py` around lines 23 - 24, Several tests (starting with the test function test_gnd_get_record_success) declare the fixture parameter app but never use it, causing Ruff ARG001; rename the unused parameter to _app (e.g., change test_gnd_get_record_success(app) to test_gnd_get_record_success(_app)) or alternatively apply `@pytest.mark.usefixtures`("app") to the test functions to keep the side-effect without a parameter; update the other affected test functions the same way (the ones around lines referenced in the review) so no tests declare an unused app parameter.
113-117:assert Falseis stripped withpython -O— usepytest.raisesinstead.🛠️ Proposed fix
- try: - gnd_get_record("040754766", debug=True) - assert False, "Should have raised ValueError" - except ValueError as e: - assert str(e) == "Test error" + with pytest.raises(ValueError, match="Test error"): + gnd_get_record("040754766", debug=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/places/test_places_gnd_tasks.py` around lines 113 - 117, The test currently uses an assert False pattern that is removed under python -O; replace the try/except with pytest's context manager: import pytest if needed, then use with pytest.raises(ValueError) as excinfo: call gnd_get_record("040754766", debug=True) and afterwards assert str(excinfo.value) == "Test error". Update the test in tests/ui/places/test_places_gnd_tasks.py and reference the gnd_get_record call and the ValueError expectation when making the change.tests/ui/places/test_places_api.py (1)
250-258: Unusedappfixture still flagged by Ruff (ARG001).Both
test_places_utils_get_places_endpointsandtest_places_utils_make_identifierdeclareapponly for its fixture side-effects; Ruff flags these. Additionally, the"bf:Isbn:978-3-16-148410-0"assertion at line 277 is now correct —make_identifierinrero_mef/places/utils.pyuses:as the separator for the no-source case (return f"{identified_by['type']}:{identified_by['value']}").Also applies to: 261-277
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/places/test_places_api.py` around lines 250 - 258, Ruff flags the `app` fixture as unused in tests test_places_utils_get_places_endpoints and test_places_utils_make_identifier; to fix, remove the `app` parameter from the function signatures and instead apply the fixture for side-effects either by adding a pytest.mark.usefixtures("app") decorator above each test or by renaming the parameter to `_app` to indicate intentional unusedness; update the tests referencing get_places_endpoints and make_identifier in rero_mef.places.utils accordingly so they still run with the fixture side-effects but no longer trigger ARG001.tests/ui/agents/test_agents_viaf_tasks.py (2)
34-34: Unused fixture parameters throughout — Ruff ARG001.All test functions in this file that declare
appand/oragent_viaf_recordpurely for fixture side-effects trigger Ruff ARG001. Rename unused parameters with a leading underscore or move them to@pytest.mark.usefixtures(...).Also applies to: 50-50, 83-83, 107-107, 126-126, 145-145, 160-160, 181-181, 198-198, 216-216, 235-235, 252-252
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_tasks.py` at line 34, Several test functions (e.g., test_refresh_viaf_record_create and other test_* functions in tests/ui/agents/test_agents_viaf_tasks.py) declare fixtures like app and agent_viaf_record only for side-effects and trigger Ruff ARG001; rename unused parameters by prefixing them with an underscore (e.g., app -> _app, agent_viaf_record -> _agent_viaf_record) or remove them from the signature and add them to `@pytest.mark.usefixtures`(...) so the fixtures still run but are not treated as unused arguments.
73-78: Dead code — setup block at lines 75–78 has no assertions and is never exercised.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_tasks.py` around lines 73 - 78, The test contains dead setup: the local variable pid = "nonexistent" and the assignment mock_get.return_value = mock_response(json_data={"sources": {}}) are never used or asserted; either remove these unused lines or convert them into meaningful assertions exercising the code path (e.g., call the function under test with pid and assert the expected result or that mock_get was called). Locate and update the test in tests/ui/agents/test_agents_viaf_tasks.py by removing the unused pid and mock_get.return_value setup or by adding an invocation and assertions that reference pid and mock_response so the mock is actually exercised.rero_mef/agents/viaf/api.py (1)
427-427: Ruff RUF059 — unpackedactionvariable is never used.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` at line 427, The tuple unpacking from self.create_or_update assigns an unused variable action; change the unpack to ignore the unused value by replacing "new_record, action = self.create_or_update(" with "new_record, _ = self.create_or_update(" (or alternatively only assign the first element via new_record = self.create_or_update(...)[0]) so the unused variable warning is resolved while keeping the call to create_or_update in place.rero_mef/agents/viaf/tasks.py (2)
51-60: Ruff RUF059 —new_record,redirect_info, andrecordare unpacked but never used.🛠️ Proposed fix
- new_record, action, redirect_info = existing_record.handle_redirect( + _new_record, action, _redirect_info = existing_record.handle_redirect( redirect_to_pid=online_data["pid"], dbcommit=dbcommit, reindex=reindex, ) ... - record, action = AgentViafRecord.create_or_update( + _record, action = AgentViafRecord.create_or_update( data=online_data, dbcommit=dbcommit, reindex=reindex, test_md5=True, )Also applies to: 64-71
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 51 - 60, The code unpacks new_record and redirect_info (and elsewhere record) from existing_record.handle_redirect(...) but never uses them; change the unpacking to ignore unused values—e.g., use _, action, _ = existing_record.handle_redirect(redirect_to_pid=online_data["pid"], dbcommit=dbcommit, reindex=reindex) or fetch the action by index (action = existing_record.handle_redirect(...)[1]) so only action is kept; apply the same change to the other occurrence (the block around lines 64-71) to remove the unused new_record/redirect_info/record variables.
143-163:scan()scroll context is not closed on earlybreak— Elasticsearch resource leak.
query.source("pid").scan()opens a Scroll API context; breaking at line 155 before the iterator is exhausted leaves that context open until the TTL expires (typically 5 min). Under concurrent or frequent scheduling this accumulates open scroll contexts. Consider using the PIT context manager (point_in_time()) or calling.scan(clear_scroll=True)(note:clear_scroll=Trueonly fires cleanup when the iterator finishes naturally, not on early break in all DSL versions).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 143 - 163, The scan() call on AgentViafSearch opens an Elasticsearch scroll that is not closed when the loop breaks early; wrap the iteration in a point-in-time context so the PIT/scroll is always closed on early exit. Replace the direct query.source("pid").scan() usage with AgentViafSearch().point_in_time() as pit: and iterate using the query bound to that PIT (or use the search API with the PIT id), and ensure the context manager is exited before returning from the function that calls _refresh_viaf_record so the PIT/scroll is cleaned up even when breaking early.rero_mef/agents/cli.py (1)
246-313:--progressflag is not propagated toprocess_viaf_rolling_refresh/process_viaf_refresh, andbatch_size=0has different semantics in rolling vs full mode.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 246 - 313, The CLI doesn’t forward the --progress flag into process_viaf_rolling_refresh and process_viaf_refresh and treats batch_size=0 inconsistently between rolling and full modes; update the calls to process_viaf_rolling_refresh(...) and process_viaf_refresh(...) to include verbose/progress (e.g. pass progress=progress or progress=(progress or verbose) as the function expects) and normalize batch_size semantics (convert batch_size==0 to None if 0 should mean “no limit” or vice versa) so both rolling and full paths use the same interpretation of batch_size and receive the progress flag.
🧹 Nitpick comments (1)
rero_mef/agents/cli.py (1)
284-289: Late import of_refresh_viaf_recordinside the loop body.
from .viaf.tasks import _refresh_viaf_recordis re-executed on every non-enqueue iteration. Python caches the import, so it is cheap, but the import statement still executes inside the hot path and is already available via the module. Hoist it to the top of the function (or module level) alongside the other task imports.♻️ Proposed fix
from .viaf.tasks import ( process_viaf_refresh, process_viaf_rolling_refresh, viaf_get_record, + _refresh_viaf_record, )Then remove the inline import at line 285.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 284 - 289, The inline import "from .viaf.tasks import _refresh_viaf_record" inside the loop should be hoisted out of the hot path: add the import at the top of the function (or module level) with the other task imports so _refresh_viaf_record is resolved once, then remove the inline import from the non-enqueue branch where action = _refresh_viaf_record(pid=pid, dbcommit=True, reindex=True, verbose=verbose) is called.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 391-433: handle_redirect currently deletes the old VIAF record
before creating the target, drops the real action returned by create_or_update
(always returns Action.REDIRECT), and never computes an md5 for the new record;
to fix: move the deletion (self.delete) to occur only after a successful
create_or_update, call add_md5(target_data) before calling create_or_update,
capture the returned action from create_or_update and propagate/return it (or
return Action.ERROR/VALIDATION_ERROR when create_or_update fails) instead of
unconditionally returning Action.REDIRECT, and ensure callers get the correct
redirect_info; refer to handle_redirect, create_or_update, add_md5, self.delete
and Action for the exact places to change.
In `@tests/ui/agents/test_agents_cli.py`:
- Line 88: The tests test_create_from_viaf and
test_create_from_viaf_with_enqueue declare the agent_viaf_record fixture only
for DB side-effects but don't use the parameter so Ruff flags ARG001; fix by
either renaming the parameter to _agent_viaf_record in the function signatures
of test_create_from_viaf and test_create_from_viaf_with_enqueue or remove the
parameter and annotate the functions with
`@pytest.mark.usefixtures`("agent_viaf_record") so the fixture still runs for
setup without an unused-argument error.
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Line 347: The tests test_handle_redirect,
test_handle_redirect_target_not_found, test_get_online_with_redirect,
test_handle_redirect_chained, and test_get_online_other_source declare an unused
app fixture causing Ruff ARG001; fix by either renaming the parameter to _app in
each function signature (e.g., def test_handle_redirect(_app, mock_get,
agent_viaf_online_response):) or removing the parameter and annotating the test
with `@pytest.mark.usefixtures`("app") to keep the side-effect-only fixture active
while eliminating the unused-argument lint error.
- Line 461: The test unpacks a three-tuple from
agent_viaf_record.handle_redirect into new_record, action, redirect_info but
never uses redirect_info (Ruff RUF059); update the unpack to new_record, action,
_redirect_info (or simply _ ) in the test to mark the unused value and silence
the linter, keeping the call to agent_viaf_record.handle_redirect unchanged.
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 247-248: The current assertion uses "or" which makes the
verbose-output check vacuous; change it to two independent assertions so both
are enforced: after calling capsys.readouterr() (captured), assert that action
== Action.CREATE and separately assert that the captured.out contains "VIAF
get:". Refer to captured, capsys.readouterr(), action, Action.CREATE and the
literal "VIAF get:" when locating the test lines to update.
---
Duplicate comments:
In `@rero_mef/agents/cli.py`:
- Around line 246-313: The CLI doesn’t forward the --progress flag into
process_viaf_rolling_refresh and process_viaf_refresh and treats batch_size=0
inconsistently between rolling and full modes; update the calls to
process_viaf_rolling_refresh(...) and process_viaf_refresh(...) to include
verbose/progress (e.g. pass progress=progress or progress=(progress or verbose)
as the function expects) and normalize batch_size semantics (convert
batch_size==0 to None if 0 should mean “no limit” or vice versa) so both rolling
and full paths use the same interpretation of batch_size and receive the
progress flag.
In `@rero_mef/agents/viaf/api.py`:
- Line 427: The tuple unpacking from self.create_or_update assigns an unused
variable action; change the unpack to ignore the unused value by replacing
"new_record, action = self.create_or_update(" with "new_record, _ =
self.create_or_update(" (or alternatively only assign the first element via
new_record = self.create_or_update(...)[0]) so the unused variable warning is
resolved while keeping the call to create_or_update in place.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 51-60: The code unpacks new_record and redirect_info (and
elsewhere record) from existing_record.handle_redirect(...) but never uses them;
change the unpacking to ignore unused values—e.g., use _, action, _ =
existing_record.handle_redirect(redirect_to_pid=online_data["pid"],
dbcommit=dbcommit, reindex=reindex) or fetch the action by index (action =
existing_record.handle_redirect(...)[1]) so only action is kept; apply the same
change to the other occurrence (the block around lines 64-71) to remove the
unused new_record/redirect_info/record variables.
- Around line 143-163: The scan() call on AgentViafSearch opens an Elasticsearch
scroll that is not closed when the loop breaks early; wrap the iteration in a
point-in-time context so the PIT/scroll is always closed on early exit. Replace
the direct query.source("pid").scan() usage with
AgentViafSearch().point_in_time() as pit: and iterate using the query bound to
that PIT (or use the search API with the PIT id), and ensure the context manager
is exited before returning from the function that calls _refresh_viaf_record so
the PIT/scroll is cleaned up even when breaking early.
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Line 34: Several test functions (e.g., test_refresh_viaf_record_create and
other test_* functions in tests/ui/agents/test_agents_viaf_tasks.py) declare
fixtures like app and agent_viaf_record only for side-effects and trigger Ruff
ARG001; rename unused parameters by prefixing them with an underscore (e.g., app
-> _app, agent_viaf_record -> _agent_viaf_record) or remove them from the
signature and add them to `@pytest.mark.usefixtures`(...) so the fixtures still
run but are not treated as unused arguments.
- Around line 73-78: The test contains dead setup: the local variable pid =
"nonexistent" and the assignment mock_get.return_value =
mock_response(json_data={"sources": {}}) are never used or asserted; either
remove these unused lines or convert them into meaningful assertions exercising
the code path (e.g., call the function under test with pid and assert the
expected result or that mock_get was called). Locate and update the test in
tests/ui/agents/test_agents_viaf_tasks.py by removing the unused pid and
mock_get.return_value setup or by adding an invocation and assertions that
reference pid and mock_response so the mock is actually exercised.
In `@tests/ui/places/test_places_api.py`:
- Around line 250-258: Ruff flags the `app` fixture as unused in tests
test_places_utils_get_places_endpoints and test_places_utils_make_identifier; to
fix, remove the `app` parameter from the function signatures and instead apply
the fixture for side-effects either by adding a pytest.mark.usefixtures("app")
decorator above each test or by renaming the parameter to `_app` to indicate
intentional unusedness; update the tests referencing get_places_endpoints and
make_identifier in rero_mef.places.utils accordingly so they still run with the
fixture side-effects but no longer trigger ARG001.
In `@tests/ui/places/test_places_gnd_tasks.py`:
- Around line 23-24: Several tests (starting with the test function
test_gnd_get_record_success) declare the fixture parameter app but never use it,
causing Ruff ARG001; rename the unused parameter to _app (e.g., change
test_gnd_get_record_success(app) to test_gnd_get_record_success(_app)) or
alternatively apply `@pytest.mark.usefixtures`("app") to the test functions to
keep the side-effect without a parameter; update the other affected test
functions the same way (the ones around lines referenced in the review) so no
tests declare an unused app parameter.
- Around line 113-117: The test currently uses an assert False pattern that is
removed under python -O; replace the try/except with pytest's context manager:
import pytest if needed, then use with pytest.raises(ValueError) as excinfo:
call gnd_get_record("040754766", debug=True) and afterwards assert
str(excinfo.value) == "Test error". Update the test in
tests/ui/places/test_places_gnd_tasks.py and reference the gnd_get_record call
and the ValueError expectation when making the change.
---
Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 284-289: The inline import "from .viaf.tasks import
_refresh_viaf_record" inside the loop should be hoisted out of the hot path: add
the import at the top of the function (or module level) with the other task
imports so _refresh_viaf_record is resolved once, then remove the inline import
from the non-enqueue branch where action = _refresh_viaf_record(pid=pid,
dbcommit=True, reindex=True, verbose=verbose) is called.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (5)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.py
📒 Files selected for processing (9)
rero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
- rero_mef/api.py
8f396e4 to
1204e52
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (4)
tests/ui/places/test_places_gnd_tasks.py (1)
26-26: Silence unusedappfixtures to avoid ARG001 lint failures.These parameters are unused and will trigger Ruff ARG001. Consider renaming to
_app(or adding# noqa: ARG001) in each affected test.🔧 Example fix (apply similarly to all listed tests)
-@mock.patch("rero_mef.places.gnd.tasks.requests_retry_session") -def test_gnd_get_record_success(mock_session, app): +@mock.patch("rero_mef.places.gnd.tasks.requests_retry_session") +def test_gnd_get_record_success(mock_session, _app):Also applies to: 50-50, 63-63, 77-77, 88-88, 111-111
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/places/test_places_gnd_tasks.py` at line 26, The tests declare an unused fixture parameter named "app" (e.g., in function test_gnd_get_record_success) which triggers Ruff ARG001; rename the parameter from app to _app (or add a per-test "# noqa: ARG001") in test_gnd_get_record_success and the other affected test functions in this file so the linter recognizes them as intentionally unused and stops reporting ARG001.tests/ui/agents/test_agents_viaf_tasks.py (1)
229-244: Weakorassertion masks verbose-output verification.
assert "VIAF get:" in captured.out or action == Action.CREATEis trivially true when the record is created (which is always the case here). Assert both independently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_tasks.py` around lines 229 - 244, The test test_refresh_viaf_record_verbose_output uses a weak combined assertion that masks checking verbose output; update the test so it asserts the verbose message and the action separately: call _refresh_viaf_record(pid=pid, dbcommit=True, reindex=True, verbose=True) into action, capture output via capsys.readouterr(), then assert "VIAF get:" is in captured.out and also assert action == Action.CREATE; refer to the test function name and the _refresh_viaf_record call to locate where to change the assertions.tests/ui/agents/test_agents_viaf_api.py (1)
462-462: Unusedredirect_infounpacking — already flagged in prior review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` at line 462, The test unpacks three values from agent_viaf_record.handle_redirect but never uses redirect_info; update the unpacking in tests/ui/agents/test_agents_viaf_api.py to avoid the unused variable by either discarding it (e.g., use _ or capture only the two used values) or use the returned redirect_info in assertions if relevant; locate the call to agent_viaf_record.handle_redirect and change the left-hand side to match the actual usage (e.g., new_record, action = agent_viaf_record.handle_redirect(...) or new_record, action, _ = ...).tests/ui/agents/test_agents_cli.py (1)
88-88: Unusedagent_viaf_recordfixture parameter — use@pytest.mark.usefixtures.The
agent_viaf_recordparameter is only needed for its DB-setup side effect. Use the decorator to avoid Ruff ARG001 warnings on bothtest_create_from_viaf(line 88) andtest_create_from_viaf_with_enqueue(line 108).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_cli.py` at line 88, The tests test_create_from_viaf and test_create_from_viaf_with_enqueue currently accept the agent_viaf_record fixture only for its DB side-effects and trigger an unused-argument warning; remove the agent_viaf_record parameter from both function signatures and add `@pytest.mark.usefixtures`("agent_viaf_record") above each test (keeping existing fixtures like script_info intact) so the fixture runs for setup without being declared as an unused argument.
🧹 Nitpick comments (3)
rero_mef/agents/cli.py (1)
289-294: Move the lazy import outside theforloop.The
from .viaf.tasks import _refresh_viaf_recordon line 290 is inside the loop body. Although Python's import caching prevents repeated I/O, placing it before the loop (or at function scope within theelseblock) is more conventional and readable.Proposed fix
+ from .viaf.tasks import _refresh_viaf_record + for pid in progress_bar: if batch_arg and count >= batch_arg: break if enqueue: task = viaf_get_record.delay(viaf_pid=pid, verbose=show_progress) if verbose: click.echo(f" viaf pid: {pid} task: {task}") else: - from .viaf.tasks import _refresh_viaf_record - action = _refresh_viaf_record( pid=pid, dbcommit=True, reindex=True, verbose=show_progress )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 289 - 294, The import statement "from .viaf.tasks import _refresh_viaf_record" is inside a loop; move that import to the containing scope (e.g., just before the for-loop or at the top of the enclosing function/else block) so it runs once, then call _refresh_viaf_record(pid=pid, dbcommit=True, reindex=True, verbose=show_progress) inside the loop; keep the function name _refresh_viaf_record to locate the code.rero_mef/agents/viaf/api.py (1)
433-437: Uselogging.exceptioninstead oflogging.errorto capture the traceback.Per Ruff TRY400,
logging.exceptionautomatically includes the exception traceback, which is more useful for debugging create/update failures during redirect handling.Proposed fix
except Exception as e: - current_app.logger.error( + current_app.logger.exception( f"Failed to create/update target VIAF {redirect_to_pid}: {e}" ) return None, Action.ERROR, redirect_info🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 433 - 437, The except block catching Exception as e should use current_app.logger.exception (not current_app.logger.error) so the traceback is included; update the except handler around the create/update redirect logic (the except Exception as e that logs "Failed to create/update target VIAF {redirect_to_pid}: {e}") to call current_app.logger.exception with the same descriptive message and keep the subsequent return None, Action.ERROR, redirect_info unchanged.rero_mef/agents/viaf/tasks.py (1)
95-114:batch_sizecheck is falsy-based — passing0andNoneboth mean "all", but the docstring saysNone.Line 105:
if batch_size and count >= batch_size— bothNoneand0are falsy, so both mean "process all." This works correctly because the CLI normalizes0toNone, but if someone calls this function directly withbatch_size=0, it silently processes all records. Consider documenting this in the docstring or using an explicitis not Nonecheck for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 95 - 114, The batch_size falsy check currently uses "if batch_size and count >= batch_size" which treats 0 the same as None; change this to an explicit None check so callers can pass 0 intentionally: replace that condition with "if batch_size is not None and count >= batch_size" in the loop that iterates over pids (referenced variables: batch_size, count, pids, progress) and update the docstring to state that None means "no limit" (or alternatively document that 0 is treated as unlimited) so behavior is explicit for callers of the function that calls _refresh_viaf_record.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 426-442: The test assertion is incompatible with handle_redirect
which propagates the inner create_or_update action; update
test_refresh_viaf_record_redirect to assert that action is one of Action.CREATE
or Action.UPDATE (e.g., replace assert action == Action.REDIRECT with assert
action in (Action.CREATE, Action.UPDATE)) so it matches the behavior of
create_or_update returned by handle_redirect (see create_or_update and delete
usages in the method).
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 346-384: test_handle_redirect asserts the returned action from
AgentViafRecord.handle_redirect is either Action.CREATE or Action.UPDATE, but
the task-layer test test_refresh_viaf_record_redirect expects Action.REDIRECT
for the same code path, so make the behavior consistent: update
AgentViafRecord.handle_redirect (and/or the wrapper create_or_update call) so it
returns Action.REDIRECT when the operation is the result of a VIAF cluster merge
redirect (the case exercised by test_handle_redirect), or alternatively change
test_handle_redirect to expect Action.REDIRECT to match
test_refresh_viaf_record_redirect; locate the logic in handle_redirect and the
create_or_update call (see viaf/api.py around the redirect handling block) and
ensure the function returns the canonical Action.REDIRECT for redirects instead
of passing through the raw create_or_update result.
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 91-99: The test expects _refresh_viaf_record to return
Action.REDIRECT but handle_redirect currently returns the raw create_or_update
action (CREATE/UPDATE); modify handle_redirect in viaf/api.py so that on
successful redirect processing it returns Action.REDIRECT (instead of returning
the create_or_update result), while preserving error handling and logging and
ensuring callers like _refresh_viaf_record still receive the REDIRECT sentinel;
alternatively, if you prefer the current behavior, update the test assertion to
check for the create_or_update action returned by handle_redirect (CREATE or
UPDATE) rather than Action.REDIRECT.
In `@tests/ui/places/test_places_api.py`:
- Around line 250-258: The test test_places_utils_get_places_endpoints relies on
a hard-coded key "plgnd" which couples it to the app fixture's RERO_PLACES
config; update the test to explicitly assert the fixture provides the expected
key before checking membership (or skip the assertion if absent) by reading the
app/config or RERO_PLACES used by get_places_endpoints, e.g. assert the
fixture/config contains the expected endpoint key or call pytest.skip when it
does not, so the membership check on endpoints is only performed when the
fixture guarantees "plgnd" exists.
- Around line 274-277: The two make_identifier implementations diverge
(places/utils.make_identifier uses "type:value" while
rero_mef/utils.make_identifier at the symbol make_identifier uses "type|value"),
causing silent lookup mismatches; fix by making the implementation in
rero_mef/utils.py produce the same no-source format as
places/utils.make_identifier (use ':' between type and value when source is
absent), update any callers found by running the suggested rg search to ensure
they expect the unified format, and add a brief compatibility note or adapter in
lookup code (where identifiers are consumed) to accept both ":" and "|" during a
transitional period.
---
Duplicate comments:
In `@tests/ui/agents/test_agents_cli.py`:
- Line 88: The tests test_create_from_viaf and
test_create_from_viaf_with_enqueue currently accept the agent_viaf_record
fixture only for its DB side-effects and trigger an unused-argument warning;
remove the agent_viaf_record parameter from both function signatures and add
`@pytest.mark.usefixtures`("agent_viaf_record") above each test (keeping existing
fixtures like script_info intact) so the fixture runs for setup without being
declared as an unused argument.
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Line 462: The test unpacks three values from agent_viaf_record.handle_redirect
but never uses redirect_info; update the unpacking in
tests/ui/agents/test_agents_viaf_api.py to avoid the unused variable by either
discarding it (e.g., use _ or capture only the two used values) or use the
returned redirect_info in assertions if relevant; locate the call to
agent_viaf_record.handle_redirect and change the left-hand side to match the
actual usage (e.g., new_record, action = agent_viaf_record.handle_redirect(...)
or new_record, action, _ = ...).
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 229-244: The test test_refresh_viaf_record_verbose_output uses a
weak combined assertion that masks checking verbose output; update the test so
it asserts the verbose message and the action separately: call
_refresh_viaf_record(pid=pid, dbcommit=True, reindex=True, verbose=True) into
action, capture output via capsys.readouterr(), then assert "VIAF get:" is in
captured.out and also assert action == Action.CREATE; refer to the test function
name and the _refresh_viaf_record call to locate where to change the assertions.
In `@tests/ui/places/test_places_gnd_tasks.py`:
- Line 26: The tests declare an unused fixture parameter named "app" (e.g., in
function test_gnd_get_record_success) which triggers Ruff ARG001; rename the
parameter from app to _app (or add a per-test "# noqa: ARG001") in
test_gnd_get_record_success and the other affected test functions in this file
so the linter recognizes them as intentionally unused and stops reporting
ARG001.
---
Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 289-294: The import statement "from .viaf.tasks import
_refresh_viaf_record" is inside a loop; move that import to the containing scope
(e.g., just before the for-loop or at the top of the enclosing function/else
block) so it runs once, then call _refresh_viaf_record(pid=pid, dbcommit=True,
reindex=True, verbose=show_progress) inside the loop; keep the function name
_refresh_viaf_record to locate the code.
In `@rero_mef/agents/viaf/api.py`:
- Around line 433-437: The except block catching Exception as e should use
current_app.logger.exception (not current_app.logger.error) so the traceback is
included; update the except handler around the create/update redirect logic (the
except Exception as e that logs "Failed to create/update target VIAF
{redirect_to_pid}: {e}") to call current_app.logger.exception with the same
descriptive message and keep the subsequent return None, Action.ERROR,
redirect_info unchanged.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 95-114: The batch_size falsy check currently uses "if batch_size
and count >= batch_size" which treats 0 the same as None; change this to an
explicit None check so callers can pass 0 intentionally: replace that condition
with "if batch_size is not None and count >= batch_size" in the loop that
iterates over pids (referenced variables: batch_size, count, pids, progress) and
update the docstring to state that None means "no limit" (or alternatively
document that 0 is treated as unlimited) so behavior is explicit for callers of
the function that calls _refresh_viaf_record.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (5)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.py
📒 Files selected for processing (9)
rero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.py
1204e52 to
cba4052
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
tests/ui/agents/test_agents_viaf_tasks.py (2)
243-244: Weakorassertion —action == Action.CREATEmakes the verbose check vacuous.Since the test always creates a new record,
action == Action.CREATEis virtually guaranteed, making the entire assertion triviallyTrueregardless of verbose output. Assert both independently.🛠️ Proposed fix
- assert "VIAF get:" in captured.out or action == Action.CREATE + assert action == Action.CREATE + assert "VIAF get:" in captured.out🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_tasks.py` around lines 243 - 244, The test currently uses a weak assertion combining verbose output and action with an `or`, making it vacuous; change it to assert both conditions separately: call `captured = capsys.readouterr()` and then assert that ` "VIAF get:" in captured.out` and separately assert `action == Action.CREATE` (i.e., replace the single `assert "VIAF get:" in captured.out or action == Action.CREATE` with two independent asserts referencing `captured.out` and `Action.CREATE`).
91-99:⚠️ Potential issue | 🔴 Critical
assert action == Action.REDIRECTis inconsistent withhandle_redirectreturn value.
handle_redirect(inviaf/api.pyline 489) returns the action fromcreate_or_update, which will beAction.CREATEorAction.UPDATE— notAction.REDIRECT. This test assertion will fail.Either fix
handle_redirectto returnAction.REDIRECTon success, or update this assertion:- assert action == Action.REDIRECT + assert action in (Action.CREATE, Action.UPDATE)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_tasks.py` around lines 91 - 99, The test asserts that _refresh_viaf_record returns Action.REDIRECT but handle_redirect (viaf.api.handle_redirect) currently returns whatever create_or_update returns (Action.CREATE or Action.UPDATE), so update either the function or the test: preferably change the test assertion to expect Action.CREATE or Action.UPDATE (or check for not None/new record existence) instead of Action.REDIRECT; alternatively, modify handle_redirect to explicitly return Action.REDIRECT on successful redirect handling after calling create_or_update. Reference symbols: _refresh_viaf_record (test), handle_redirect (viaf.api), create_or_update, and Action enum.tests/ui/agents/test_agents_viaf_api.py (3)
374-376:assert action in (Action.CREATE, Action.UPDATE)is inconsistent with the task-layer test that expectsAction.REDIRECTfor the same code path.
test_refresh_viaf_record_redirectintests/ui/agents/test_agents_viaf_tasks.py(line ~95) assertsAction.REDIRECTfor a redirect operation, while this test accepts the rawcreate_or_updateresult. One of the two tests must be wrong. The recommended fix is to havehandle_redirectinrero_mef/agents/viaf/api.pyreturn the canonicalAction.REDIRECTsentinel rather than forwarding the innercreate_or_updateaction, then update this assertion toassert action == Action.REDIRECT.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` around lines 374 - 376, The test is asserting that create_or_update's raw result is returned from the redirect handler, but the task-layer test expects the canonical Action.REDIRECT sentinel; update handle_redirect in rero_mef/agents/viaf/api.py to return Action.REDIRECT (not forward the inner create_or_update return value) when a redirect is handled, and then change this test's assertion to assert action == Action.REDIRECT so both unit and task-layer tests align; reference the handle_redirect function and create_or_update call to locate where to replace the forwarded return with the Action.REDIRECT sentinel.
347-347: All five new test functions still carry an unusedappfixture argument (ARG001).
appis used only for its side-effects (application context setup). Either prefix it with_or apply@pytest.mark.usefixtures("app")to suppress the warning cleanly.🔧 Proposed fix (applied to all five functions)
-def test_handle_redirect(mock_get, app, agent_viaf_online_response): +@pytest.mark.usefixtures("app") +def test_handle_redirect(mock_get, agent_viaf_online_response): -def test_handle_redirect_target_not_found(mock_get, app): +@pytest.mark.usefixtures("app") +def test_handle_redirect_target_not_found(mock_get): -def test_get_online_with_redirect(mock_get, app, agent_viaf_online_response): +@pytest.mark.usefixtures("app") +def test_get_online_with_redirect(mock_get, agent_viaf_online_response): -def test_handle_redirect_chained(mock_get, app): +@pytest.mark.usefixtures("app") +def test_handle_redirect_chained(mock_get): -def test_get_online_other_source(mock_get, app, agent_viaf_online_response): +@pytest.mark.usefixtures("app") +def test_get_online_other_source(mock_get, agent_viaf_online_response):Also applies to: 388-388, 419-419, 439-439, 474-474
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` at line 347, The five new tests (including test_handle_redirect and the other four test functions added around the same area) take an unused app fixture which triggers ARG001; either rename the parameter to _app to indicate intentional unused usage or remove the parameter and add `@pytest.mark.usefixtures`("app") above each test so the app fixture still runs for side-effects; update the signatures for test_handle_redirect and the four sibling tests (the other test_* functions added in the same diff) accordingly to suppress the unused-fixture warning.
461-466: Unpacked variableredirect_infois never referenced in the test body (RUF059).🔧 Proposed fix
- new_record, action, redirect_info = agent_viaf_record.handle_redirect( + new_record, action, _redirect_info = agent_viaf_record.handle_redirect( redirect_to_pid=new_pid, dbcommit=True, reindex=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` around lines 461 - 466, The test unpacks three return values from agent_viaf_record.handle_redirect into new_record, action, redirect_info but never uses redirect_info; update the unpack to avoid the unused variable by either capturing it as a deliberately ignored name (e.g. _redirect_info) or by only unpacking the two used values (new_record, action) and discarding the third, ensuring you still call agent_viaf_record.handle_redirect with the same args; reference symbols: agent_viaf_record.handle_redirect, new_record, action, redirect_info.
🧹 Nitpick comments (5)
rero_mef/agents/cli.py (2)
326-337: Move the deferred import outside the loop.The
from .viaf.tasks import _refresh_viaf_recordimport on line 327 is inside thefor pid in progress_barloop. While Python caches modules after the first import, this is an unnecessary re-evaluation on each iteration. Move it before the loop.♻️ Suggested fix
if not enqueue: + from .viaf.tasks import _refresh_viaf_record + for pid in progress_bar: if batch_arg and count >= batch_arg: break if enqueue: task = viaf_get_record.delay( viaf_pid=pid, verbose=verbose, progress=progress, delete_if_not_found=delete_if_not_found, update_agents=update_agents, ) if verbose: click.echo(f" viaf pid: {pid} task: {task}") else: - from .viaf.tasks import _refresh_viaf_record - action = _refresh_viaf_record(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 326 - 337, The import for _refresh_viaf_record is inside the loop causing repeated import evaluation; move "from .viaf.tasks import _refresh_viaf_record" to just before the "for pid in progress_bar" loop so the module is imported once, then call _refresh_viaf_record(...) inside the loop exactly as currently done; ensure the imported name matches the existing calls and there are no local name shadowing issues.
301-344: Missing mode:_refresh_viaf_recordis called directly instead of through task layer.In the
missingmode withenqueue=False(lines 326-339), the code calls_refresh_viaf_recorddirectly (a private function) rather than using the public task function like the other modes useprocess_viaf_refresh/process_viaf_rolling_refresh. This works but bypasses theset_timestamptracking that the other task wrappers provide.This is a design choice — the per-PID approach gives more granular control and progress reporting. Just noting the difference in behavior: no
set_timestampis called for missing mode.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 301 - 344, The missing-mode loop currently calls the private helper _refresh_viaf_record(...) directly (bypassing timestamp tracking); change it to call the public task wrapper used by other modes (e.g., process_viaf_refresh or the synchronous equivalent of viaf_get_record) so set_timestamp is executed consistently; replace the direct call in the else branch with the public function (importing it if needed), pass the same parameters (pid, dbcommit=True, reindex=True, verbose, progress, delete_if_not_found, update_agents), and ensure the returned action value is handled the same way in action_counts.rero_mef/agents/viaf/api.py (2)
468-484: Same recommendation: uselogger.exceptionfor the redirect error path.This catch block handles failures during create/update of the redirect target. Including the traceback via
logger.exceptionwill help diagnose issues like DB errors, schema validation failures, etc.♻️ Suggested fix
except Exception as e: - current_app.logger.error( + current_app.logger.exception( f"Failed to create/update target VIAF {redirect_to_pid}: {e}" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 468 - 484, In the except block around the call to self.create_or_update (handling redirect target creation), replace current_app.logger.error(...) with current_app.logger.exception(...) so the traceback is logged along with the message that includes redirect_to_pid; keep the existing info-log and delete call (self.delete) and the return of (None, Action.ERROR, redirect_info) intact—this change should be made in the try/except that surrounds create_or_update in the VIAF agent API so errors from create_or_update (and any underlying DB/schema issues) include full stack traces for debugging.
385-387: Uselogger.exceptionto preserve traceback in parse error logging.When catching a broad
Exceptionduring external API response parsing (which is reasonable here given unpredictable API formats),logger.exceptionwill include the full traceback, making it much easier to diagnose parsing failures in production.♻️ Suggested fix
except Exception as e: - current_app.logger.error(f"Error parsing VIAF response for {pid}: {e}") + current_app.logger.exception(f"Error parsing VIAF response for {pid}: {e}") return {}, f"VIAF get: {pid:<15} {url} | PARSE ERROR: {e}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 385 - 387, Replace the current generic error log call in the except block that catches parse errors with a traceback-preserving call: change current_app.logger.error(f"Error parsing VIAF response for {pid}: {e}") to current_app.logger.exception(...) so the full traceback is recorded while keeping the same contextual message (including pid and url) and returning the same tuple; update the except block in the VIAF response parsing code where pid and url are referenced to use current_app.logger.exception to preserve diagnostics.rero_mef/agents/viaf/tasks.py (1)
100-151:batch_size=0bypasses the early-termination check — document or normalize.When
batch_size=0is passed (e.g., from CLI where0means "all"), the guardif batch_size and count >= batch_size(line 134) isFalsebecause0is falsy, so all records are processed. This works correctly but relies on Python truthiness of0. The docstring saysNone=all, but0also means "all" in practice. Consider normalizing at the call site (the CLI already doesbatch_arg = None if batch_size == 0 else batch_size) or documenting this in the docstring.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 100 - 151, The function process_viaf_refresh relies on Python truthiness for batch_size so passing 0 (falsy) behaves like None; normalize or make the check explicit by either converting batch_size==0 to None at the start of process_viaf_refresh or change the guard to an explicit None check (e.g., use "if batch_size is not None and count >= batch_size") so batch_size=0 is handled deterministically; update the docstring to state that 0 is treated as "all" if you choose normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 140-152: The test test_viaf_get_record_task asserts an exact call
to _refresh_viaf_record but only lists four kwargs; viaf_get_record forwards all
seven kwargs (including progress, delete_if_not_found, update_agents with their
defaults) so assert_called_once_with will fail — update the assertion in
test_viaf_get_record_task to include the full set of keyword arguments passed by
viaf_get_record (pid=pid, dbcommit=True, reindex=True, verbose=False,
progress=False, delete_if_not_found=False, update_agents=True) to match the
actual call to _refresh_viaf_record.
---
Duplicate comments:
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 374-376: The test is asserting that create_or_update's raw result
is returned from the redirect handler, but the task-layer test expects the
canonical Action.REDIRECT sentinel; update handle_redirect in
rero_mef/agents/viaf/api.py to return Action.REDIRECT (not forward the inner
create_or_update return value) when a redirect is handled, and then change this
test's assertion to assert action == Action.REDIRECT so both unit and task-layer
tests align; reference the handle_redirect function and create_or_update call to
locate where to replace the forwarded return with the Action.REDIRECT sentinel.
- Line 347: The five new tests (including test_handle_redirect and the other
four test functions added around the same area) take an unused app fixture which
triggers ARG001; either rename the parameter to _app to indicate intentional
unused usage or remove the parameter and add `@pytest.mark.usefixtures`("app")
above each test so the app fixture still runs for side-effects; update the
signatures for test_handle_redirect and the four sibling tests (the other test_*
functions added in the same diff) accordingly to suppress the unused-fixture
warning.
- Around line 461-466: The test unpacks three return values from
agent_viaf_record.handle_redirect into new_record, action, redirect_info but
never uses redirect_info; update the unpack to avoid the unused variable by
either capturing it as a deliberately ignored name (e.g. _redirect_info) or by
only unpacking the two used values (new_record, action) and discarding the
third, ensuring you still call agent_viaf_record.handle_redirect with the same
args; reference symbols: agent_viaf_record.handle_redirect, new_record, action,
redirect_info.
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 243-244: The test currently uses a weak assertion combining
verbose output and action with an `or`, making it vacuous; change it to assert
both conditions separately: call `captured = capsys.readouterr()` and then
assert that ` "VIAF get:" in captured.out` and separately assert `action ==
Action.CREATE` (i.e., replace the single `assert "VIAF get:" in captured.out or
action == Action.CREATE` with two independent asserts referencing `captured.out`
and `Action.CREATE`).
- Around line 91-99: The test asserts that _refresh_viaf_record returns
Action.REDIRECT but handle_redirect (viaf.api.handle_redirect) currently returns
whatever create_or_update returns (Action.CREATE or Action.UPDATE), so update
either the function or the test: preferably change the test assertion to expect
Action.CREATE or Action.UPDATE (or check for not None/new record existence)
instead of Action.REDIRECT; alternatively, modify handle_redirect to explicitly
return Action.REDIRECT on successful redirect handling after calling
create_or_update. Reference symbols: _refresh_viaf_record (test),
handle_redirect (viaf.api), create_or_update, and Action enum.
---
Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 326-337: The import for _refresh_viaf_record is inside the loop
causing repeated import evaluation; move "from .viaf.tasks import
_refresh_viaf_record" to just before the "for pid in progress_bar" loop so the
module is imported once, then call _refresh_viaf_record(...) inside the loop
exactly as currently done; ensure the imported name matches the existing calls
and there are no local name shadowing issues.
- Around line 301-344: The missing-mode loop currently calls the private helper
_refresh_viaf_record(...) directly (bypassing timestamp tracking); change it to
call the public task wrapper used by other modes (e.g., process_viaf_refresh or
the synchronous equivalent of viaf_get_record) so set_timestamp is executed
consistently; replace the direct call in the else branch with the public
function (importing it if needed), pass the same parameters (pid, dbcommit=True,
reindex=True, verbose, progress, delete_if_not_found, update_agents), and ensure
the returned action value is handled the same way in action_counts.
In `@rero_mef/agents/viaf/api.py`:
- Around line 468-484: In the except block around the call to
self.create_or_update (handling redirect target creation), replace
current_app.logger.error(...) with current_app.logger.exception(...) so the
traceback is logged along with the message that includes redirect_to_pid; keep
the existing info-log and delete call (self.delete) and the return of (None,
Action.ERROR, redirect_info) intact—this change should be made in the try/except
that surrounds create_or_update in the VIAF agent API so errors from
create_or_update (and any underlying DB/schema issues) include full stack traces
for debugging.
- Around line 385-387: Replace the current generic error log call in the except
block that catches parse errors with a traceback-preserving call: change
current_app.logger.error(f"Error parsing VIAF response for {pid}: {e}") to
current_app.logger.exception(...) so the full traceback is recorded while
keeping the same contextual message (including pid and url) and returning the
same tuple; update the except block in the VIAF response parsing code where pid
and url are referenced to use current_app.logger.exception to preserve
diagnostics.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 100-151: The function process_viaf_refresh relies on Python
truthiness for batch_size so passing 0 (falsy) behaves like None; normalize or
make the check explicit by either converting batch_size==0 to None at the start
of process_viaf_refresh or change the guard to an explicit None check (e.g., use
"if batch_size is not None and count >= batch_size") so batch_size=0 is handled
deterministically; update the docstring to state that 0 is treated as "all" if
you choose normalization.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (5)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.py
📒 Files selected for processing (9)
rero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.py
c4ecc0f to
40d6ca1
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
tests/ui/places/test_places_gnd_tasks.py (1)
26-26:⚠️ Potential issue | 🟡 MinorRename unused
appfixture args to_appin all tests.Lines 26, 50, 63, 77, 88, and 111 declare
appbut never use it, triggering ARG001. Keep fixture setup behavior but silence lint by renaming to_app.Suggested patch
`@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session") -def test_gnd_get_record_success(mock_session, app): +def test_gnd_get_record_success(mock_session, _app): @@ `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session") -def test_gnd_get_record_http_error(mock_session, app): +def test_gnd_get_record_http_error(mock_session, _app): @@ `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session") -def test_gnd_get_record_no_records(mock_session, app): +def test_gnd_get_record_no_records(mock_session, _app): @@ `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session") -def test_gnd_get_record_exception(mock_session, app): +def test_gnd_get_record_exception(mock_session, _app): @@ `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session") -def test_gnd_get_record_pid_mismatch(mock_session, app): +def test_gnd_get_record_pid_mismatch(mock_session, _app): @@ `@mock.patch`("rero_mef.places.gnd.tasks.requests_retry_session") -def test_gnd_get_record_debug_reraises(mock_session, app): +def test_gnd_get_record_debug_reraises(mock_session, _app):Also applies to: 50-50, 63-63, 77-77, 88-88, 111-111
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/places/test_places_gnd_tasks.py` at line 26, Several test functions (e.g., test_gnd_get_record_success) declare the fixture parameter named app but never use it, triggering ARG001; rename that parameter to _app in each test that accepts the unused fixture to silence the lint warning while keeping fixture behavior (search for functions in this file with a parameter named app and change the parameter to _app, e.g., def test_gnd_get_record_success(mock_session, app) -> def test_gnd_get_record_success(mock_session, _app)).tests/ui/places/test_places_api.py (1)
258-258: Hard-coded"plgnd"key — existing open concern.The coupling between this assertion and the test
appfixture'sRERO_PLACESconfig was already flagged in a prior review. Flagging again to avoid losing track.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/places/test_places_api.py` at line 258, Replace the hard-coded "plgnd" assertion in tests/ui/places/test_places_api.py with a dynamic check against the app/test fixture configuration (use the RERO_PLACES value or derive the expected key from the same fixture that builds endpoints) so the test asserts that the configured place key exists rather than the literal "plgnd"; update the assertion that currently reads assert "plgnd" in endpoints to reference the configuration (e.g., app.config['RERO_PLACES'] or the fixture-provided expected_key) so the test no longer depends on a magic string.tests/ui/agents/test_agents_cli.py (1)
88-88:⚠️ Potential issue | 🟡 MinorUnused side-effect fixture arguments still trigger ARG001.
agent_viaf_recordis used only for setup side-effects in both tests and never referenced directly. Rename to_agent_viaf_record(or use@pytest.mark.usefixtures) to clear lint warnings.🧹 Proposed fix
-def test_create_from_viaf(script_info, agent_viaf_record): +def test_create_from_viaf(script_info, _agent_viaf_record): ... -def test_create_from_viaf_with_enqueue(script_info, agent_viaf_record): +def test_create_from_viaf_with_enqueue(script_info, _agent_viaf_record):Also applies to: 108-108
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_cli.py` at line 88, The test function test_create_from_viaf accepts the fixture agent_viaf_record only for side-effects, which triggers ARG001; either rename the parameter to _agent_viaf_record or remove the parameter and add `@pytest.mark.usefixtures`("agent_viaf_record") above test_create_from_viaf to silence the lint warning; apply the same change for the other test that also takes agent_viaf_record at the second occurrence.rero_mef/agents/cli.py (1)
275-283:⚠️ Potential issue | 🟠 Major
--batch_size 0is still inconsistent in rolling mode.
Line 276maps0toNone, butprocess_viaf_rolling_refreshtreatsNoneas “use configured default batch size”, not “all”. So rolling mode silently processes only a default subset while help text says0=all.🔧 Proposed fix
- # Normalize batch_size: 0 means "all records" - batch_arg = None if batch_size == 0 else batch_size + # Normalize batch_size: + # - rolling + 0 => process all VIAF records + # - other modes + 0 => no limit (None) + if rolling and batch_size == 0: + batch_arg = AgentViafRecord.count() + else: + batch_arg = None if batch_size == 0 else batch_sizeAlso applies to: 290-291
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 275 - 283, The CLI currently maps batch_size==0 to None (batch_arg) which makes process_viaf_rolling_refresh interpret it as "use default" instead of "all"; change the call sites so rolling mode passes an explicit 0 (or a dedicated sentinel meaning "all") to process_viaf_rolling_refresh.delay and the non-enqueue direct call instead of None—e.g., compute a rolling_batch_arg = 0 if batch_size == 0 else batch_size and use that in the process_viaf_rolling_refresh.delay and the direct process_viaf_rolling_refresh(...) invocation so rolling mode truly means "0=all".
🧹 Nitpick comments (1)
rero_mef/agents/cli.py (1)
203-210: Guardbatch_sizeat the CLI boundary.
type=intallows negative values, which leads to confusing early-termination behavior in refresh loops. Prefer non-negative validation in Click option parsing.🛡️ Proposed fix
`@click.option`( "-b", "--batch_size", "batch_size", default=0, - type=int, + type=click.IntRange(min=0), help="Limit number of records to process (0=all).", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 203 - 210, The CLI option for batch_size currently uses type=int which permits negatives; update the Click option declaration (the click.option for "-b/--batch_size" in rero_mef.agents.cli) to enforce non-negative values by using click.IntRange(min=0) as the type or by adding a callback that raises click.BadParameter for batch_size < 0; ensure the option name "batch_size" and any code reading it (e.g., the refresh loop) keep the same identifier and behavior for zero meaning "all".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 76-93: AgentViafRecord.create_or_update currently triggers
record.create_mef_and_agents internally, so remove the external duplicate block
that calls record.create_mef_and_agents when update_agents is True; instead
extend AgentViafRecord.create_or_update to accept update_agents and verbose
(alongside existing params like dbcommit, reindex, test_md5) and gate the
side-effect there (call record.create_mef_and_agents only when update_agents is
True and honor verbose), then update callers to pass update_agents/verbose into
AgentViafRecord.create_or_update and delete the downstream conditional that
re-invokes record.create_mef_and_agents.
In `@tests/ui/agents/test_agents_cli.py`:
- Around line 132-157: The test test_create_from_viaf_with_viaf_file fails
because the CLI is invoked without the test application fixture; update the test
function signature to include the app fixture (change def
test_create_from_viaf_with_viaf_file(script_info, tmpdir): to def
test_create_from_viaf_with_viaf_file(app, script_info, tmpdir):) so
create_from_viaf runs with the test app context; keep the rest of the test
(mocking task_create_mef_and_agents_from_viaf and using CliRunner) unchanged.
---
Duplicate comments:
In `@rero_mef/agents/cli.py`:
- Around line 275-283: The CLI currently maps batch_size==0 to None (batch_arg)
which makes process_viaf_rolling_refresh interpret it as "use default" instead
of "all"; change the call sites so rolling mode passes an explicit 0 (or a
dedicated sentinel meaning "all") to process_viaf_rolling_refresh.delay and the
non-enqueue direct call instead of None—e.g., compute a rolling_batch_arg = 0 if
batch_size == 0 else batch_size and use that in the
process_viaf_rolling_refresh.delay and the direct
process_viaf_rolling_refresh(...) invocation so rolling mode truly means
"0=all".
In `@tests/ui/agents/test_agents_cli.py`:
- Line 88: The test function test_create_from_viaf accepts the fixture
agent_viaf_record only for side-effects, which triggers ARG001; either rename
the parameter to _agent_viaf_record or remove the parameter and add
`@pytest.mark.usefixtures`("agent_viaf_record") above test_create_from_viaf to
silence the lint warning; apply the same change for the other test that also
takes agent_viaf_record at the second occurrence.
In `@tests/ui/places/test_places_api.py`:
- Line 258: Replace the hard-coded "plgnd" assertion in
tests/ui/places/test_places_api.py with a dynamic check against the app/test
fixture configuration (use the RERO_PLACES value or derive the expected key from
the same fixture that builds endpoints) so the test asserts that the configured
place key exists rather than the literal "plgnd"; update the assertion that
currently reads assert "plgnd" in endpoints to reference the configuration
(e.g., app.config['RERO_PLACES'] or the fixture-provided expected_key) so the
test no longer depends on a magic string.
In `@tests/ui/places/test_places_gnd_tasks.py`:
- Line 26: Several test functions (e.g., test_gnd_get_record_success) declare
the fixture parameter named app but never use it, triggering ARG001; rename that
parameter to _app in each test that accepts the unused fixture to silence the
lint warning while keeping fixture behavior (search for functions in this file
with a parameter named app and change the parameter to _app, e.g., def
test_gnd_get_record_success(mock_session, app) -> def
test_gnd_get_record_success(mock_session, _app)).
---
Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 203-210: The CLI option for batch_size currently uses type=int
which permits negatives; update the Click option declaration (the click.option
for "-b/--batch_size" in rero_mef.agents.cli) to enforce non-negative values by
using click.IntRange(min=0) as the type or by adding a callback that raises
click.BadParameter for batch_size < 0; ensure the option name "batch_size" and
any code reading it (e.g., the refresh loop) keep the same identifier and
behavior for zero meaning "all".
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (6)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (9)
rero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.py
40d6ca1 to
3d57629
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
rero_mef/agents/viaf/api.py (1)
471-477:⚠️ Potential issue | 🟡 MinorRename the unused
actionbinding.
handle_redirect()always returnsAction.REDIRECTfrom this branch, so the localactionfromcreate_or_update()is dead and Ruff will flag it. Rename it to_actionor_.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 471 - 477, The local variable "action" assigned from create_or_update(...) inside handle_redirect() is unused because this branch always returns Action.REDIRECT; rename the binding to a throwaway name (e.g., _action or _) to silence the linter. Update the tuple assignment in the handle_redirect call site where create_or_update(...) is invoked so it captures (new_record, _action) instead of (new_record, action) and leave the rest of the logic unchanged.tests/ui/agents/test_agents_viaf_api.py (1)
461-465:⚠️ Potential issue | 🟡 MinorMark
redirect_infoas intentionally unused.This test never asserts the third return value, so Ruff will still report RUF059 here. Rename it to
_redirect_infoor_.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` around lines 461 - 465, The third return value from agent_viaf_record.handle_redirect is unused and triggers RUF059; rename redirect_info to a deliberately unused variable (e.g., _redirect_info or _) in the assignment new_record, action, _redirect_info = agent_viaf_record.handle_redirect(...) to silence the linter while keeping the call to agent_viaf_record.handle_redirect intact.rero_mef/agents/viaf/tasks.py (1)
56-93:⚠️ Potential issue | 🟠 Major
update_agentsdoes not actually gate downstream updates.
AgentViafRecord.create_or_update()already callsrecord.create_mef_and_agents()inrero_mef/agents/viaf/api.py, and the redirect branch reaches the same side effect throughexisting_record.handle_redirect(). Soupdate_agents=Falsestill rebuilds MEF/agent records, while the extra block here makes non-redirect CREATE/UPDATE paths do the work twice. That defeats the new flag and also wipes out most of the MD5-based savings on no-op refreshes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 56 - 93, The downstream MEF/agent rebuild is being run twice because AgentViafRecord.create_or_update (in rero_mef/agents/viaf/api.py) already calls record.create_mef_and_agents and this tasks.py also calls record.create_mef_and_agents when update_agents is True; to fix, add an update_agents boolean param to AgentViafRecord.create_or_update and thread it through so create_or_update(data=..., dbcommit=..., reindex=..., test_md5=True, update_agents=update_agents) performs the MEF/agent work when requested, then remove the post-create_or_update block in rero_mef/agents/viaf/tasks.py that calls record.create_mef_and_agents (the redirect path can keep using existing_record.handle_redirect which already handles it). Ensure all call sites of create_or_update are updated to pass the new flag and adjust unit tests accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 125-144: The loop iterates over a PID sequence produced by
AgentViafRecord.get_all_pids() which uses limit/offset and can be invalidated
when _refresh_viaf_record() deletes/recreates PIDs; to fix, materialize a stable
snapshot of PIDs before the loop (e.g., replace the streaming/DB-paged iterator
with a concrete list from AgentViafRecord.get_all_pids()) or refactor this path
to use the PIT-style iteration used by process_viaf_rolling_refresh so offsets
can't shift under the loop; update the progressbar input accordingly to iterate
over the snapshot or the PIT iterator to ensure no records are skipped.
---
Duplicate comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 471-477: The local variable "action" assigned from
create_or_update(...) inside handle_redirect() is unused because this branch
always returns Action.REDIRECT; rename the binding to a throwaway name (e.g.,
_action or _) to silence the linter. Update the tuple assignment in the
handle_redirect call site where create_or_update(...) is invoked so it captures
(new_record, _action) instead of (new_record, action) and leave the rest of the
logic unchanged.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 56-93: The downstream MEF/agent rebuild is being run twice because
AgentViafRecord.create_or_update (in rero_mef/agents/viaf/api.py) already calls
record.create_mef_and_agents and this tasks.py also calls
record.create_mef_and_agents when update_agents is True; to fix, add an
update_agents boolean param to AgentViafRecord.create_or_update and thread it
through so create_or_update(data=..., dbcommit=..., reindex=..., test_md5=True,
update_agents=update_agents) performs the MEF/agent work when requested, then
remove the post-create_or_update block in rero_mef/agents/viaf/tasks.py that
calls record.create_mef_and_agents (the redirect path can keep using
existing_record.handle_redirect which already handles it). Ensure all call sites
of create_or_update are updated to pass the new flag and adjust unit tests
accordingly.
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 461-465: The third return value from
agent_viaf_record.handle_redirect is unused and triggers RUF059; rename
redirect_info to a deliberately unused variable (e.g., _redirect_info or _) in
the assignment new_record, action, _redirect_info =
agent_viaf_record.handle_redirect(...) to silence the linter while keeping the
call to agent_viaf_record.handle_redirect intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 137e0b9f-27ad-4cef-ae02-0b1d6c46e402
⛔ Files ignored due to path filters (6)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (9)
rero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.py
✅ Files skipped from review due to trivial changes (3)
- tests/ui/agents/test_agents_cli.py
- tests/ui/places/test_places_api.py
- tests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/ui/agents/test_agents_viaf_tasks.py
3d57629 to
a241a58
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rero_mef/agents/viaf/api.py (1)
321-407:⚠️ Potential issue | 🟠 MajorDistinguish transient VIAF API failures from genuine record misses.
get_online_record()returns{}for three different scenarios: non-200 HTTP responses, JSON parse failures, and records where the PID doesn't match the request. All callers check onlyif not online_data:, so_refresh_viaf_record()incorrectly returnsAction.NOT_FOUNDfor transient network/parse errors. This breaks retry semantics, makes action stats misleading, and prevents proper error handling in bulk refresh jobs.Return a typed outcome (or add a status field to
result) so transient failures can be distinguished from genuine misses. TheAction.ERRORenum value exists but currently cannot be used. Add test cases stubbing 5xx and malformed-JSON responses to lock down error path behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 321 - 407, The get_online_record() function currently conflates transient HTTP/parse failures with genuine "not found" results; modify get_online_record() to return a typed outcome by adding a status field (e.g. result["status"] = "ok" | "not_found" | "error") or an explicit enum (use existing Action.ERROR) alongside the data so callers can distinguish transient failures; specifically, inside the HTTP error branch (non-200) and the exception parse block in get_online_record() set result["status"]="error" (and include an error message key), on successful parse set result["status"]="ok", and when a PID mismatch occurs set "not_found"; update callers such as _refresh_viaf_record() to check result["status"] (or the returned enum) and only map "not_found" to Action.NOT_FOUND while mapping "error" to Action.ERROR; add tests that stub 5xx responses and malformed JSON to assert the new error status is returned and that _refresh_viaf_record() yields Action.ERROR for transient failures.
♻️ Duplicate comments (1)
rero_mef/agents/viaf/tasks.py (1)
27-34:⚠️ Potential issue | 🟠 Major
update_agentsis currently ineffective.Both the normal path here and the redirect path ultimately land in
AgentViafRecord.create_or_update(), which still unconditionally runsrecord.create_mef_and_agents(...)inrero_mef/agents/viaf/api.py. So--update-agentsdoes not actually disable downstream MEF/agent work, and evenUPTODATErefreshes still pay for that reconciliation cost. Please gate that side effect increate_or_update()and thread the flag through here.As per coding guidelines, focus on performance problems and edge cases.
Also applies to: 76-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 34, The update_agents flag is never honored because both code paths call AgentViafRecord.create_or_update(), which still always invokes record.create_mef_and_agents(...); modify _refresh_viaf_record to pass the update_agents parameter into create_or_update (thread the flag through both the normal and redirect branches) and then change AgentViafRecord.create_or_update to accept and respect that flag by gating the call to record.create_mef_and_agents(...) (only call create_mef_and_agents when update_agents is True); ensure the flag is propagated through any intermediate helpers so UPTODATE refreshes skip the expensive MEF/agent reconciliation when update_agents is False.
🧹 Nitpick comments (2)
rero_mef/places/idref/api.py (1)
72-76: Use prefix-only slicing instead of global replacement for improved semantic correctness.On line 72,
replace("(DE-101)", "")removes every occurrence of the token, but only the leading prefix should be stripped. While current data doesn't contain duplicate tokens (verified in production datasets), the.replace()approach is semantically imprecise and fragile to future data changes. Using[len(prefix):]to slice off only the leading prefix is clearer in intent and more robust.Proposed diff
pids = list( + prefix = "(DE-101)" - dict.fromkeys( + dict.fromkeys( - identified_by.get("value", "").replace("(DE-101)", "") + identified_by.get("value", "")[len(prefix) :] for identified_by in self.get("identifiedBy", []) if identified_by.get("source") == "GND" and identified_by.get("type") == "bf:Nbn" - and identified_by.get("value", "").startswith("(DE-101)") + and identified_by.get("value", "").startswith(prefix) ) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/places/idref/api.py` around lines 72 - 76, The current list comprehension removes every occurrence of the token "(DE-101)" via identified_by.get("value", "").replace("(DE-101)", ""), which is imprecise; change it to slice off only the leading prefix when the value starts with that token (e.g., compute prefix="(DE-101)" and use identified_by.get("value", "")[len(prefix):] for the matching branch). Keep the existing filter (identified_by.get("source") == "GND" and identified_by.get("type") == "bf:Nbn" and identified_by.get("value", "").startswith("(DE-101)")) and apply the prefix-only slicing in place of .replace() to produce the trimmed value.tests/ui/agents/test_agents_viaf_api.py (1)
405-414: Assert that failed redirects keep the source record intact.These failure cases only assert
Action.ERROR, but the critical invariant here is thatold_pidmust still resolve whendelete_if_not_found=False. Adding that assertion would catch the data-loss regression this path is most prone to.💡 Example assertion
assert action == Action.ERROR assert redirect_info == {"from": old_pid, "to": new_pid} assert new_record is None + assert AgentViafRecord.get_record_by_pid(old_pid) is not NoneApply the same check to the chained-redirect and exception paths.
As per coding guidelines, focus on logic errors and architectural improvements.
Also applies to: 461-469, 508-514
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` around lines 405 - 414, Add assertions that the source record remains resolvable after failed redirects: after calling agent_viaf_record.handle_redirect(redirect_to_pid=new_pid, dbcommit=True, reindex=True) and asserting action == Action.ERROR, also call the resolver (or use agent_viaf_record.resolve/get_by_pid helper used elsewhere) to ensure old_pid still returns the original record (i.e., not deleted) — mimicking delete_if_not_found=False semantics; apply the same extra assertion to the chained-redirect and exception-path tests referenced (the blocks around lines 461-469 and 508-514) to prevent regressions that delete the source on failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/monitoring/api.py`:
- Around line 152-156: get_es_count() can return a diagnostic string (e.g. "No
... in ES") but the diff currently replaces db-es with an empty string which
hides the failure; change the assignment that sets info[doc_type]["db-es"] so
that when count_es is not an int you set a machine-readable failure value (for
example None or a sentinel like -1) instead of "", keep info[doc_type]["es"] as
the original count_es string, and update Monitoring.check() to treat
non-int/None/-1 db-es values as a failure state so missing ES indices no longer
appear healthy; locate symbols get_es_count, info[doc_type]["es"],
info[doc_type]["db-es"], and Monitoring.check to apply these changes.
In `@rero_mef/monitoring/views.py`:
- Around line 59-60: The exception handlers currently return JSON with the error
string but still send HTTP 200; update the handlers (the except Exception as
error blocks in monitoring.views) to return a 5xx HTTP response (e.g., 500)
along with the error JSON so callers/monitoring see a failure; change both
occurrences (the handler around the DB query at the first except and the second
occurrence at lines referenced) to return the JSON and an explicit 500 status
(or use make_response to set status) instead of letting Flask default to 200.
---
Outside diff comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 321-407: The get_online_record() function currently conflates
transient HTTP/parse failures with genuine "not found" results; modify
get_online_record() to return a typed outcome by adding a status field (e.g.
result["status"] = "ok" | "not_found" | "error") or an explicit enum (use
existing Action.ERROR) alongside the data so callers can distinguish transient
failures; specifically, inside the HTTP error branch (non-200) and the exception
parse block in get_online_record() set result["status"]="error" (and include an
error message key), on successful parse set result["status"]="ok", and when a
PID mismatch occurs set "not_found"; update callers such as
_refresh_viaf_record() to check result["status"] (or the returned enum) and only
map "not_found" to Action.NOT_FOUND while mapping "error" to Action.ERROR; add
tests that stub 5xx responses and malformed JSON to assert the new error status
is returned and that _refresh_viaf_record() yields Action.ERROR for transient
failures.
---
Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 27-34: The update_agents flag is never honored because both code
paths call AgentViafRecord.create_or_update(), which still always invokes
record.create_mef_and_agents(...); modify _refresh_viaf_record to pass the
update_agents parameter into create_or_update (thread the flag through both the
normal and redirect branches) and then change AgentViafRecord.create_or_update
to accept and respect that flag by gating the call to
record.create_mef_and_agents(...) (only call create_mef_and_agents when
update_agents is True); ensure the flag is propagated through any intermediate
helpers so UPTODATE refreshes skip the expensive MEF/agent reconciliation when
update_agents is False.
---
Nitpick comments:
In `@rero_mef/places/idref/api.py`:
- Around line 72-76: The current list comprehension removes every occurrence of
the token "(DE-101)" via identified_by.get("value", "").replace("(DE-101)", ""),
which is imprecise; change it to slice off only the leading prefix when the
value starts with that token (e.g., compute prefix="(DE-101)" and use
identified_by.get("value", "")[len(prefix):] for the matching branch). Keep the
existing filter (identified_by.get("source") == "GND" and
identified_by.get("type") == "bf:Nbn" and identified_by.get("value",
"").startswith("(DE-101)")) and apply the prefix-only slicing in place of
.replace() to produce the trimmed value.
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 405-414: Add assertions that the source record remains resolvable
after failed redirects: after calling
agent_viaf_record.handle_redirect(redirect_to_pid=new_pid, dbcommit=True,
reindex=True) and asserting action == Action.ERROR, also call the resolver (or
use agent_viaf_record.resolve/get_by_pid helper used elsewhere) to ensure
old_pid still returns the original record (i.e., not deleted) — mimicking
delete_if_not_found=False semantics; apply the same extra assertion to the
chained-redirect and exception-path tests referenced (the blocks around lines
461-469 and 508-514) to prevent regressions that delete the source on failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 01657951-c99e-41cd-aa87-d69b44c2764c
⛔ Files ignored due to path filters (7)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedocker-services.ymlis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (13)
rero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.py
✅ Files skipped from review due to trivial changes (2)
- tests/ui/places/test_places_api.py
- tests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/ui/agents/test_agents_cli.py
- tests/ui/agents/test_agents_viaf_tasks.py
a241a58 to
09a4cc1
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
rero_mef/agents/viaf/tasks.py (1)
32-33:⚠️ Potential issue | 🟠 Major
update_agentsis currently a no-op.This call goes through
AgentViafRecord.create_or_update(), whose override inrero_mef/agents/viaf/api.pyunconditionally callsrecord.create_mef_and_agents(...). So the refresh still mutates linked MEF/GND/IdRef records even whenupdate_agents=False, and everyUPTODATEVIAF record pays that extra work too. Please gate that side effect in the override, or bypass the override from this task when the flag is false.Also applies to: 76-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 32 - 33, The update_agents flag is effectively ignored because AgentViafRecord.create_or_update (in the override in api.py) unconditionally calls record.create_mef_and_agents, causing unnecessary mutations; change the override in AgentViafRecord.create_or_update to check the update_agents argument before invoking create_mef_and_agents (i.e., only call record.create_mef_and_agents when update_agents is True), or alternatively modify the task code that calls AgentViafRecord.create_or_update to bypass the overridden method and call a no-side-effect path when update_agents is False; reference the symbols AgentViafRecord.create_or_update and create_mef_and_agents when making the conditional gating so UPTODATE records skip the heavy side-effect when update_agents is False.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/cli.py`:
- Around line 203-210: The --batch_size click option (parameter name
"batch_size") currently allows negative values; change its validator so negative
inputs are rejected — e.g., replace type=int with type=click.IntRange(min=0) or
add a callback that raises click.BadParameter for values < 0, and apply the same
fix to the other batch_size option instance elsewhere in the file so the CLI
enforces the documented "0=all" semantics.
In `@rero_mef/agents/viaf/api.py`:
- Around line 321-326: The unprotected call to requests_retry_session().get(url,
headers=headers) in the VIAF fetch flow should be wrapped in a try/except that
catches requests.RequestException (and related connection/read exceptions) so
network-level failures don't raise out and abort the PID refresh; inside the
except set an appropriate error state on the existing result dict (e.g.,
result['status']='error' or result['error']=str(exc')), update msg to include
the exception details and the pid, and return or continue with that result so
downstream parsing/handlers (within the function that uses pid, response,
result, msg) can record the failure instead of crashing. Ensure you reference
and preserve the existing variables pid, url, headers, result and msg when
implementing the try/except around the get call.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 117-120: Remove the unsupported preserve_order=True usage on
AgentViafSearch().params(...) when used with .point_in_time().iterate(): delete
the preserve_order param and add an explicit .sort({"_updated": {"order":
"asc"}}) to the query that builds the PIT iterator (the code invoking
AgentViafSearch().params(...).source("pid").point_in_time().iterate()). Do the
same for the other occurrence around the .sort({"_updated": {"order": "asc"}})
region (replace any preserve_order=True there with the explicit .sort call) so
both PIT-based queries have deterministic ordering.
In `@rero_mef/monitoring/api.py`:
- Around line 187-189: The code in check() currently writes db_es as None into
checks, conflating ES-unavailable states with numeric deltas; change the logic
in the function check (the block that currently tests db_es) to only set
checks[info]["db_es"] when db_es is an integer/number (exclude None and
non-numeric values) and instead set a new key checks[info]["es_error"] =
"unavailable" (or the actual error message) when db_es is None or indicates an
ES connectivity/missing-index issue; then update the monitoring view (the
mapping in rero_mef/monitoring/views.py that formats db_es) to look for es_error
and translate it to the dedicated error code/message rather than rendering
"There are None items...".
In `@rero_mef/places/idref/api.py`:
- Around line 70-78: The comprehension that builds pids uses replace("(DE-101)",
"") which removes all occurrences and can yield empty IDs; change it to remove
only the leading prefix and skip empty results: when iterating over
self.get("identifiedBy", []) (the code building pids), detect values that
startwith("(DE-101)"), strip the prefix using str.removeprefix("(DE-101)") or
value[len("(DE-101)"):], then only include the resulting pid if it is non-empty;
keep the dict.fromkeys(...) dedupe logic but filter out any empty strings before
constructing pids.
---
Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 32-33: The update_agents flag is effectively ignored because
AgentViafRecord.create_or_update (in the override in api.py) unconditionally
calls record.create_mef_and_agents, causing unnecessary mutations; change the
override in AgentViafRecord.create_or_update to check the update_agents argument
before invoking create_mef_and_agents (i.e., only call
record.create_mef_and_agents when update_agents is True), or alternatively
modify the task code that calls AgentViafRecord.create_or_update to bypass the
overridden method and call a no-side-effect path when update_agents is False;
reference the symbols AgentViafRecord.create_or_update and create_mef_and_agents
when making the conditional gating so UPTODATE records skip the heavy
side-effect when update_agents is False.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 78aeda97-de80-4299-8467-2e111219e398
⛔ Files ignored due to path filters (7)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedocker-services.ymlis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (13)
rero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.py
✅ Files skipped from review due to trivial changes (2)
- tests/ui/places/test_places_api.py
- tests/ui/places/test_places_gnd_tasks.py
🚧 Files skipped from review as they are similar to previous changes (3)
- rero_mef/monitoring/views.py
- rero_mef/monitoring/cli.py
- tests/ui/agents/test_agents_viaf_tasks.py
09a4cc1 to
788790b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
rero_mef/agents/viaf/tasks.py (2)
117-120:⚠️ Potential issue | 🟠 MajorRemove
preserve_order=Truefrom the PIT queries.The Elasticsearch DSL docs say
Search.params()forwards search-request parameters, whilepreserve_orderis defined on the scan helper anditerate()is the PIT-based alternative. Keepingpreserve_order=Trueon thesepoint_in_time().iterate()paths is therefore an API mismatch on both refresh queries. (elasticsearch-py.readthedocs.io)🔧 Suggested fix
- query = AgentViafSearch().params(preserve_order=True).source("pid") + query = AgentViafSearch().source("pid") @@ - AgentViafSearch() - .params(preserve_order=True) + AgentViafSearch() .sort({"_updated": {"order": "asc"}}) .source("pid")Also applies to: 176-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 117 - 120, The code is passing preserve_order=True via AgentViafSearch().params(...) while using point_in_time().iterate(), which is an API mismatch; remove the preserve_order=True parameter from the Search.params(...) calls that precede point_in_time().iterate() (e.g. the AgentViafSearch().params(preserve_order=True).source("pid") usage around the point_in_time().iterate() blocks and the similar occurrence at the later block around lines 176-184) so the PIT-based iterate() is used without the scan-only preserve_order parameter.
27-34:⚠️ Potential issue | 🟠 Major
update_agentsis currently a no-op in this task layer.
_refresh_viaf_record()accepts the flag and every public task forwards it, but the value is never read or passed into eitherhandle_redirect()orcreate_or_update(). From this module,update_agents=TrueandFalsebehave identically, so the task API/docstrings are misleading.Either thread the flag through the lower-level update paths or remove it from these task signatures until it is actually supported.
Also applies to: 60-65, 76-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 34, The update_agents flag accepted by _refresh_viaf_record (and the other task functions in this module that forward it) is a no-op: it is never read or passed to lower-level functions like handle_redirect() or create_or_update(), so update_agents True/False behave identically. Fix by either (A) threading the flag through: add an update_agents parameter to handle_redirect() and create_or_update(), pass the update_agents value from _refresh_viaf_record into those calls and update any internal logic to act on it; or (B) remove update_agents from the task function signatures and all callers in this module (and update any docstrings/tests) so the API is accurate. Ensure you update all occurrences of update_agents in this module (including the other task wrappers that currently accept and forward it) to keep the signatures and behavior consistent.
🧹 Nitpick comments (3)
rero_mef/agents/viaf/api.py (1)
478-478: Suppress unusedactionvariable fromcreate_or_update.The static analysis correctly flags that
actionis unpacked but never used. Rename it to_actionto indicate it's intentionally discarded.🔧 Suggested fix
- new_record, action = self.create_or_update( + new_record, _action = self.create_or_update( data=target_data, dbcommit=dbcommit, reindex=reindex, test_md5=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` at line 478, The tuple unpacking from create_or_update currently assigns an unused variable named action; change the unpack target to _action to signal it's intentionally discarded (e.g., replace "new_record, action = self.create_or_update(...)" with "new_record, _action = self.create_or_update(...)") so static analysis no longer flags the unused variable while leaving create_or_update and new_record usage unchanged.tests/ui/agents/test_agents_viaf_api.py (1)
461-461: Suppress unusedredirect_infovariable.The unpacked
redirect_infois not used in this test. Rename it to_redirect_infoto indicate it's intentionally discarded.🔧 Suggested fix
- new_record, action, redirect_info = agent_viaf_record.handle_redirect( + new_record, action, _redirect_info = agent_viaf_record.handle_redirect( redirect_to_pid=new_pid, dbcommit=True, reindex=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` at line 461, The test unpacks a third value from agent_viaf_record.handle_redirect into redirect_info but never uses it; update the unpacking to assign that value to a throwaway name (e.g., _redirect_info) so linters understand it's intentionally discarded, by changing the tuple assignment where agent_viaf_record.handle_redirect is called.rero_mef/cli.py (1)
1280-1288: Consider adding a guard forget_entity_classreturningNone.While
pid_typevalues come fromget_multiple_missing_pidswhich should return valid types, adding a defensive check would prevent potentialAttributeErrorifget_entity_classreturnsNone.🛡️ Suggested defensive check
entity_cls = get_entity_class(pid_type) + if entity_cls is None: + if verbose: + click.secho(f"Unknown entity type: {pid_type}", fg="yellow") + continue for entity_pid in entity_pids: record = entity_cls.get_record_by_pid(entity_pid)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/cli.py` around lines 1280 - 1288, Add a defensive guard after calling get_entity_class(pid_type) to handle the case it returns None: check if entity_cls is None, optionally log a clear warning (using click.secho when verbose) that the pid_type is unknown, and continue to the next pid_type/entity_pid instead of calling get_record_by_pid on None; this uses the symbols get_entity_class, entity_cls, pid_type, get_record_by_pid, and relates to pids sourced from get_multiple_missing_pids.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 115-126: The current logic treats batch_size=0 as "no limit"
because it uses truthiness checks; update the code that computes length and that
checks the loop limit to use explicit None checks instead (e.g., compute length
= min(batch_size, total) if batch_size is not None else total, and replace "if
batch_size and count >= batch_size" with "if batch_size is not None and count >=
batch_size") so that a caller passing 0 will result in zero items processed;
adjust references in the AgentViafRecord.count and AgentViafSearch() iteration
logic accordingly.
In `@rero_mef/cli.py`:
- Around line 1302-1306: The loop incorrectly invokes the instance method delete
on the class (mef_cls.delete(record, ...)); change the call to invoke the
instance method on the retrieved record object (record.delete(...)) and pass the
same keyword args (force=True, dbcommit=True, delindex=True); ensure you locate
the loop that iterates orphan_pids and uses mef_cls.get_record_by_pid(...) so
you update the delete call there and adjust any related counters
(deleted_orphan_count and orphan_details[group_name]["deleted"]) accordingly.
---
Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 117-120: The code is passing preserve_order=True via
AgentViafSearch().params(...) while using point_in_time().iterate(), which is an
API mismatch; remove the preserve_order=True parameter from the
Search.params(...) calls that precede point_in_time().iterate() (e.g. the
AgentViafSearch().params(preserve_order=True).source("pid") usage around the
point_in_time().iterate() blocks and the similar occurrence at the later block
around lines 176-184) so the PIT-based iterate() is used without the scan-only
preserve_order parameter.
- Around line 27-34: The update_agents flag accepted by _refresh_viaf_record
(and the other task functions in this module that forward it) is a no-op: it is
never read or passed to lower-level functions like handle_redirect() or
create_or_update(), so update_agents True/False behave identically. Fix by
either (A) threading the flag through: add an update_agents parameter to
handle_redirect() and create_or_update(), pass the update_agents value from
_refresh_viaf_record into those calls and update any internal logic to act on
it; or (B) remove update_agents from the task function signatures and all
callers in this module (and update any docstrings/tests) so the API is accurate.
Ensure you update all occurrences of update_agents in this module (including the
other task wrappers that currently accept and forward it) to keep the signatures
and behavior consistent.
---
Nitpick comments:
In `@rero_mef/agents/viaf/api.py`:
- Line 478: The tuple unpacking from create_or_update currently assigns an
unused variable named action; change the unpack target to _action to signal it's
intentionally discarded (e.g., replace "new_record, action =
self.create_or_update(...)" with "new_record, _action =
self.create_or_update(...)") so static analysis no longer flags the unused
variable while leaving create_or_update and new_record usage unchanged.
In `@rero_mef/cli.py`:
- Around line 1280-1288: Add a defensive guard after calling
get_entity_class(pid_type) to handle the case it returns None: check if
entity_cls is None, optionally log a clear warning (using click.secho when
verbose) that the pid_type is unknown, and continue to the next
pid_type/entity_pid instead of calling get_record_by_pid on None; this uses the
symbols get_entity_class, entity_cls, pid_type, get_record_by_pid, and relates
to pids sourced from get_multiple_missing_pids.
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Line 461: The test unpacks a third value from
agent_viaf_record.handle_redirect into redirect_info but never uses it; update
the unpacking to assign that value to a throwaway name (e.g., _redirect_info) so
linters understand it's intentionally discarded, by changing the tuple
assignment where agent_viaf_record.handle_redirect is called.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 34484d78-7cac-42f4-98a3-47836ea18a86
⛔ Files ignored due to path filters (12)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedata/cidref.jsonis excluded by none and included by nonedata/mef.jsonis excluded by none and included by nonedata/mef_id.csvis excluded by!**/*.csvand included by nonedata/mef_metadata.csvis excluded by!**/*.csvand included by nonedata/mef_pidstore.csvis excluded by!**/*.csvand included by nonedocker-services.ymlis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (16)
rero_mef/agents/api.pyrero_mef/agents/cli.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/cli.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.pytests/ui/test_cli.py
✅ Files skipped from review due to trivial changes (1)
- tests/ui/agents/test_agents_cli.py
🚧 Files skipped from review as they are similar to previous changes (4)
- rero_mef/places/idref/api.py
- tests/ui/places/test_places_api.py
- rero_mef/monitoring/cli.py
- rero_mef/monitoring/api.py
1d48299 to
b5937cd
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
rero_mef/agents/viaf/tasks.py (2)
179-190:⚠️ Potential issue | 🟠 MajorRolling refresh is not guaranteed to be oldest-first with
scan().This query adds a sort on
_updated, butSearch.scan()does not preserve that ordering by default. That means the rolling job walks scan order rather than oldest-first order, so the intended ~6-month refresh cycle is not reliable. Prefer an ordered PIT/search-after iteration here.In elasticsearch-dsl, does Search.scan() honor an explicit .sort() without preserve_order=True, and what is the recommended PIT/search_after pattern for guaranteed oldest-first iteration?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 179 - 190, The current loop uses AgentViafSearch().sort({"_updated": {"order": "asc"}}).source("pid") and then iterates query.scan(), but Search.scan() does not guarantee the sort order unless preserve_order is used or you switch to a PIT + search_after pattern; as a result the rolling refresh won't be strictly oldest-first. Fix by replacing the scan-based iteration with a point-in-time (PIT) + paginated search_after loop (open PIT, run sorted searches with size=batch_size and search_after tokens, close PIT) or, if you must keep scan(), call scan(preserve_order=True) to enforce the sort; update the loop that consumes progressbar(items=...) to use the new iterator and ensure AgentViafSearch and progressbar still receive .source("pid") and the sort on "_updated".
27-33:⚠️ Potential issue | 🟠 Major
update_agentsis still a no-op.This task accepts the flag, but neither the redirect path nor the normal create/update path consults it before touching MEF/agent data. Both routes still end up in
AgentViafRecord.create_or_update(), which unconditionally callscreate_mef_and_agents(), so refreshes do the extra reconciliation even whenupdate_agents=False.Also applies to: 57-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 33, The task _refresh_viaf_record currently ignores the update_agents flag because it always calls AgentViafRecord.create_or_update(), which unconditionally invokes create_mef_and_agents(); update the flow so the flag is respected by either (a) adding an explicit parameter to AgentViafRecord.create_or_update(update_agents=False) and having create_or_update skip create_mef_and_agents() when False, or (b) keeping create_or_update unchanged but branching in _refresh_viaf_record so that when update_agents is False you do not call the path that triggers create_mef_and_agents() (e.g., bypass the redirect/create/update route that reconciles MEF/agents); ensure both the redirect-handling branch and the normal create/update branch consult update_agents before touching MEF/agent data.
🧹 Nitpick comments (1)
tests/ui/agents/test_agents_viaf_api.py (1)
390-399: Redundant reassignment ofold_pid.Line 399 reassigns
old_pid = agent_viaf_record.pid, butold_pidwas already set to"12345678"on line 390 and the record was created with that value. This line is unnecessary.agent_viaf_record = AgentViafRecord.create( data=viaf_data, dbcommit=True, reindex=True ) - old_pid = agent_viaf_record.pid new_pid = "999999999"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` around lines 390 - 399, The variable old_pid is redundantly reassigned after creating the record; remove the unnecessary reassignment (old_pid = agent_viaf_record.pid) so the test uses the originally set old_pid value and avoid duplicate assignment—update the test around AgentViafRecord.create and agent_viaf_record.pid to simply rely on the initial old_pid assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/api.py`:
- Around line 275-280: The current AgentMefSearch block uses .filter("terms",
sources=entity_names) which excludes stored MEF docs because MEF documents don't
have a sources array; update get_unlinked_agents to stop filtering on the
non-existent sources field and instead detect documents by their source-specific
keys (e.g. check existence of idref, gnd, rero) or build a boolean should that
matches the entity_names to the appropriate source field(s); specifically,
change the AgentMefSearch query to remove the terms on sources and add a boolean
should/exists clause for idref, gnd, rero (or otherwise map entity_names to the
correct document key) so AgentMefSearch returns the intended unlinked MEF
records.
In `@rero_mef/agents/cli.py`:
- Around line 96-99: The CLI currently ignores the --enqueue flag when
--unlinked is set in the create_from_viaf command: locate the validation and
command handler around the unlinked/missing/viaf_file logic (variables unlinked,
enqueue and the create_from_viaf command function) and either 1) explicitly
reject the combination by extending the UsageError to include enqueue
("--unlinked cannot be combined with --missing, --viaf-file, or --enqueue"), or
2) implement enqueue semantics for unlinked mode by branching on enqueue inside
the create_from_viaf handler so that when unlinked and enqueue are True you
enqueue jobs instead of running the synchronous lookup loop; update only that
validation/branching logic so behavior is consistent and documented in the error
message or code path.
In `@rero_mef/agents/viaf/api.py`:
- Around line 324-333: The try/except in the VIAF GET handler currently returns
a truthy payload {"status": "error", ...} which causes callers like
_refresh_viaf_record to treat request failures as valid records and proceed into
add_md5() and create_or_update(); change the except block to return an empty
payload (e.g. {}) together with the same msg so callers that check for falsy
records will correctly skip processing (keep the msg formatting with pid and url
and still return result as {} instead of the {"status":"error"} dict); reference
the requests_retry_session GET call, the except handling, and callers
_refresh_viaf_record, add_md5, create_or_update when making the change.
In `@rero_mef/cli.py`:
- Around line 41-46: The import of rero_mef.cli_logging
(ensure_single_stream_handler) is causing ModuleNotFoundError; either add the
missing module exporting ensure_single_stream_handler or make the import
optional: in rero_mef.cli replace the direct "from .cli_logging import
ensure_single_stream_handler" with a guarded import that catches ImportError and
provides a lightweight fallback/no-op ensure_single_stream_handler, or remove
the import and its usages where present (e.g., in any CLI setup functions) so
importing rero_mef.cli or rero_mef.agents.cli no longer fails at import time.
- Around line 1267-1272: The loop that populates details iterates all pid types
returned by get_multiple_missing_pids() even when their entity_map is empty,
causing details.setdefault(...) to create zero-count entries; update the loop in
rero_mef/cli.py (the for pid_type, entity_map in multiple_pids.items() block) to
skip empty buckets by adding a guard like "if not entity_map: continue" before
computing entity_pids/count and updating duplicates_count and info, so details
only gets populated for pid types that actually have duplicates.
In `@tests/ui/agents/test_agents_cli.py`:
- Around line 165-168: The test currently patches
rero_mef.agents.viaf.tasks._refresh_viaf_record, but harvest_viaf has already
imported that function into rero_mef.agents.cli, so the test must patch the
CLI's bound reference instead; update the mock.patch target to
"rero_mef.agents.cli._refresh_viaf_record" (or patch the harvest_viaf symbol
directly) so the call made by harvest_viaf uses the mock rather than the real
_refresh_viaf_record.
---
Duplicate comments:
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 179-190: The current loop uses AgentViafSearch().sort({"_updated":
{"order": "asc"}}).source("pid") and then iterates query.scan(), but
Search.scan() does not guarantee the sort order unless preserve_order is used or
you switch to a PIT + search_after pattern; as a result the rolling refresh
won't be strictly oldest-first. Fix by replacing the scan-based iteration with a
point-in-time (PIT) + paginated search_after loop (open PIT, run sorted searches
with size=batch_size and search_after tokens, close PIT) or, if you must keep
scan(), call scan(preserve_order=True) to enforce the sort; update the loop that
consumes progressbar(items=...) to use the new iterator and ensure
AgentViafSearch and progressbar still receive .source("pid") and the sort on
"_updated".
- Around line 27-33: The task _refresh_viaf_record currently ignores the
update_agents flag because it always calls AgentViafRecord.create_or_update(),
which unconditionally invokes create_mef_and_agents(); update the flow so the
flag is respected by either (a) adding an explicit parameter to
AgentViafRecord.create_or_update(update_agents=False) and having
create_or_update skip create_mef_and_agents() when False, or (b) keeping
create_or_update unchanged but branching in _refresh_viaf_record so that when
update_agents is False you do not call the path that triggers
create_mef_and_agents() (e.g., bypass the redirect/create/update route that
reconciles MEF/agents); ensure both the redirect-handling branch and the normal
create/update branch consult update_agents before touching MEF/agent data.
---
Nitpick comments:
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 390-399: The variable old_pid is redundantly reassigned after
creating the record; remove the unnecessary reassignment (old_pid =
agent_viaf_record.pid) so the test uses the originally set old_pid value and
avoid duplicate assignment—update the test around AgentViafRecord.create and
agent_viaf_record.pid to simply rely on the initial old_pid assignment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e21e1aa3-4dc7-4fda-9a4d-cc433008d250
⛔ Files ignored due to path filters (12)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedata/cidref.jsonis excluded by none and included by nonedata/mef.jsonis excluded by none and included by nonedata/mef_id.csvis excluded by!**/*.csvand included by nonedata/mef_metadata.csvis excluded by!**/*.csvand included by nonedata/mef_pidstore.csvis excluded by!**/*.csvand included by nonedocker-services.ymlis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (18)
rero_mef/agents/api.pyrero_mef/agents/cli.pyrero_mef/agents/mef/api.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/api_mef.pyrero_mef/cli.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.pytests/ui/test_cli.py
💤 Files with no reviewable changes (1)
- rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (3)
- tests/ui/places/test_places_api.py
- rero_mef/api.py
- tests/ui/test_cli.py
🚧 Files skipped from review as they are similar to previous changes (4)
- rero_mef/monitoring/api.py
- tests/ui/places/test_places_gnd_tasks.py
- rero_mef/monitoring/cli.py
- rero_mef/places/idref/api.py
2f5aa41 to
f2d45c2
Compare
2cf0b58 to
37a79e3
Compare
📝 WalkthroughWalkthroughAdds VIAF harvest/refresh workflows (Celery tasks + CLI), robust VIAF HTTP handling (timeouts, retries, 429/backoff, redirects and redirect reconciliation), CLI logging helpers, MARC 686 → DDC extraction, identifier format tweak, numerous tests, and repository AI-assistant guideline docs. Changes
Sequence Diagram(s)sequenceDiagram
actor CLI as CLI
participant Tasks as VIAF Tasks
participant VIAF as VIAF Online
participant AgentsAPI as Agents API
participant DB as Database
participant ES as Elasticsearch
CLI->>Tasks: process_viaf_refresh(batch_size, enqueue?)
loop per PID
Tasks->>VIAF: GET /viaf (pid or source|id) with headers/timeouts
alt 200 OK
VIAF-->>Tasks: viaf_data
Tasks->>AgentsAPI: create_or_update_viaf(viaf_data)
AgentsAPI->>DB: persist/update
DB->>ES: reindex
else 429 Rate Limit
VIAF-->>Tasks: 429 + Retry-After?
Tasks->>Tasks: _sleep_with_countdown()
Tasks->>VIAF: retry GET
else Redirect (cluster merge)
VIAF-->>Tasks: REDIRECT -> new_pid
Tasks->>AgentsAPI: handle_redirect(new_pid)
AgentsAPI->>VIAF: GET new_pid
AgentsAPI->>DB: create/update new record
AgentsAPI->>DB: delete old record
DB->>ES: reindex
end
end
Tasks-->>CLI: (count, action_counts)
sequenceDiagram
actor User as User CLI
participant AgentsCLI as Agents CLI
participant AgentsAPI as Agents API
participant VIAF as VIAF Online
participant MEF as MEF DB
User->>AgentsCLI: create_from_viaf --unlinked
AgentsCLI->>AgentsAPI: get_unlinked_agents(relink=False)
AgentsAPI->>MEF: scan MEF for records without viaf_pid
MEF-->>AgentsAPI: (mef_pid, source, entity_pid)
loop per unlinked
AgentsAPI->>VIAF: get_online_record(source, entity_pid)
alt Found
VIAF-->>AgentsAPI: viaf_data with viaf_pid
AgentsAPI->>MEF: create_or_update_viaf_link(mef_pid, viaf_pid)
else Not Found / Error
VIAF-->>AgentsAPI: not found / error
end
end
AgentsCLI-->>User: summary (processed, cleaned)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rero_mef/places/utils.py (1)
30-34:⚠️ Potential issue | 🟠 MajorThis formatter change is likely not used by the production place-indexing path.
rero_mef/places/listener.pycallsmake_identifierfromrero_mef/utils.py, so updatingrero_mef/places/utils.pyhere does not change the runtime_identifierenrichment flow. This leaves duplicated identifier logic and makes future divergence likely. Please consolidate to a single source of truth (preferrero_mef/utils.py) and update call sites/imports accordingly.Proposed consolidation (example)
-from flask import current_app +from flask import current_app +from rero_mef.utils import make_identifier as shared_make_identifier @@ def make_identifier(identified_by): - """Make identifier `type|(source)value`.""" - if source := identified_by.get("source"): - return f"{identified_by['type']}|({source}){identified_by['value']}" - return f"{identified_by['type']}|{identified_by['value']}" + """Proxy to shared identifier formatter to avoid drift.""" + return shared_make_identifier(identified_by)As per coding guidelines, for
**/*.pywe should focus on “correctness issues” and “logic errors and architectural improvements.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/places/utils.py` around lines 30 - 34, The make_identifier implementation in rero_mef/places/utils.py is duplicating logic already used by the runtime path (listener) which imports make_identifier from rero_mef/utils.py; remove the duplicate function from rero_mef/places/utils.py and consolidate to a single source of truth by using the implementation in rero_mef/utils.py (prefer rero_mef/utils.py as canonical), update any call sites and imports (e.g., rero_mef/places/listener.py and any other modules) to import make_identifier from rero_mef.utils, and run/update related tests to ensure there are no import or behavior regressions.
🧹 Nitpick comments (5)
rero_mef/agents/viaf/tasks.py (1)
208-237:progressparameter is accepted but not forwarded.The
viaf_get_recordtask accepts aprogressparameter (line 214) but doesn't pass it to_refresh_viaf_record. Since_refresh_viaf_recorddoesn't accept aprogressparameter, this is unused. Consider removing it from the signature for clarity.♻️ Remove unused `progress` parameter
`@shared_task` def viaf_get_record( viaf_pid, dbcommit=True, reindex=True, verbose=False, - progress=False, delete_if_not_found=False, update_agents=False, ): """Fetch and update a single VIAF record from the online API. :param viaf_pid: VIAF PID to fetch. :param dbcommit: Commit changes to DB. :param reindex: Reindex record. :param verbose: Print verbose messages. - :param progress: Display progress information. :param delete_if_not_found: Delete old record if redirect target not found. :param update_agents: Update MEF and agent records (GND/IdRef) after VIAF update. :returns: Action performed. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 208 - 237, The viaf_get_record task declares a unused progress parameter that is not forwarded to _refresh_viaf_record; remove the progress parameter from the viaf_get_record signature, delete its mention from the docstring and default args, and update any callers that pass progress to stop providing it (or adjust callers accordingly); locate the viaf_get_record function and its docstring to make these edits and ensure behavior remains unchanged.rero_mef/agents/cli.py (1)
159-160: Consider catching more specific exceptions.The broad
except Exceptioncan mask unexpected errors. Consider catchingrequests.RequestExceptionor the specific exceptions thatget_online_recordmight raise.♻️ Suggested narrower exception handling
- except Exception as err: - return mef_pid, entity_pid, {}, f"Error: {err}" + except (requests.RequestException, TimeoutError) as err: + return mef_pid, entity_pid, {}, f"Error: {err}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 159 - 160, The broad except in the block returning (mef_pid, entity_pid, {}, f"Error: {err}") should be narrowed — catch requests.RequestException (for network/HTTP errors from get_online_record) and any specific parsing errors that get_online_record may raise (e.g., ValueError, KeyError) instead of Exception; update the except clause(s) around get_online_record in the function where mef_pid and entity_pid are handled to handle these specific exceptions and preserve the existing return shape and error message.rero_mef/cli.py (1)
1102-1136: Acceptable defensive exception handling.The broad
except Exceptioncatches inrabbitmq_queue_countare appropriate here since:
- The function is used for monitoring/waiting, not critical operations
- Returning
0on failure is a safe fallback that allows processing to continue- Various broker-related exceptions could occur (connection issues, auth failures, etc.)
However, consider logging these exceptions at debug level to aid troubleshooting.
♻️ Optional: add debug logging for exceptions
+import logging + +logger = logging.getLogger(__name__) + def rabbitmq_queue_count(): ... except Exception: + logger.debug("Failed to declare queue %s", queue_name, exc_info=True) continue finally: channel.close() return count except Exception: + logger.debug("Failed to get RabbitMQ queue count", exc_info=True) return 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/cli.py` around lines 1102 - 1136, Add debug-level logging for exceptions in rabbitmq_queue_count: create or use a module logger (logging.getLogger(__name__)) and log the inner exceptions raised by channel.queue_declare (the inner except Exception) with context (queue_name) and exc_info=True, and also log the outer broad except Exception that returns 0 with exc_info=True so broker/connection failures (current_celery.connection_or_acquire / connection.channel) are recorded for troubleshooting; keep behavior unchanged otherwise.rero_mef/agents/viaf/api.py (1)
579-585: Suppress unusedactionvariable.Ruff flags the unpacked
actionvariable as unused. Rename to_actionto indicate it's intentionally discarded.♻️ Prefix with underscore
try: - new_record, action = self.create_or_update( + new_record, _action = self.create_or_update( data=target_data, dbcommit=dbcommit, reindex=reindex, test_md5=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 579 - 585, The unpacked variable action from the call to self.create_or_update in the viaf API is unused and flagged by Ruff; update the unpacking to use a throwaway name (e.g., change "new_record, action = self.create_or_update(...)" to "new_record, _action = self.create_or_update(...)" or similar) in the same scope so the intent to discard the second return value is clear and linter warnings are suppressed while leaving create_or_update unchanged.rero_mef/agents/mef/api.py (1)
25-25: Import cleanup correctly removes unused dependencies. Theclickandprogressbarimports were only used by the removedget_all_missing_viaf_pidsmethod, and their removal is appropriate.The verification found that
get_all_missing_viaf_pidsnow exists only as a module-level function inrero_mef/agents/api.pywith no call sites in the codebase. Consider removing this dead code if it is no longer needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/mef/api.py` at line 25, The module-level function get_all_missing_viaf_pids in rero_mef/agents/api.py appears to be dead code (no call sites) after removing click/progressbar usage; locate the get_all_missing_viaf_pids definition in that file and remove it (or move it to a utilities/archive location if you want to keep it for reference), and run a project-wide search for get_all_missing_viaf_pids to confirm there are no remaining callers before committing the removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/marctojson/do_gnd_concepts.py`:
- Around line 216-218: The logger.info call uses multiple positional arguments
without a format placeholder which causes a TypeError when handlers perform
string interpolation; for each occurrence (e.g., the call guarded by self.logger
and self.verbose in do_gnd_concepts.py around the trans_gnd_classification
invocation), change the call to either include a format placeholder (e.g., use
"%s" and pass the function name as the second argument) or build a single
formatted message string before calling self.logger.info, and apply the same fix
to the other instances across marctojson (do_idref_concepts.py,
do_rero_concepts.py, do_gnd_places.py, do_idref_places.py, do_gnd_agent.py,
do_idref_agent.py, do_rero_agent.py, do_skeleton.py) where self.logger.info is
called with multiple args but no formatting placeholder.
In `@rero_mef/utils.py`:
- Around line 963-964: The current check in utils.py that raises KeyError("pid")
immediately when record.get("pid") is falsy should be changed to first attempt
normalization from the legacy viaf_pid field: if record lacks "pid" but has
"viaf_pid", set/normalize record["pid"] from record["viaf_pid"] (using the same
normalization logic already present elsewhere in the function) and only raise
KeyError("pid") if both record.get("pid") and record.get("viaf_pid") are absent;
update the branch around the existing KeyError to perform this fallback
normalization before raising.
---
Outside diff comments:
In `@rero_mef/places/utils.py`:
- Around line 30-34: The make_identifier implementation in
rero_mef/places/utils.py is duplicating logic already used by the runtime path
(listener) which imports make_identifier from rero_mef/utils.py; remove the
duplicate function from rero_mef/places/utils.py and consolidate to a single
source of truth by using the implementation in rero_mef/utils.py (prefer
rero_mef/utils.py as canonical), update any call sites and imports (e.g.,
rero_mef/places/listener.py and any other modules) to import make_identifier
from rero_mef.utils, and run/update related tests to ensure there are no import
or behavior regressions.
---
Nitpick comments:
In `@rero_mef/agents/cli.py`:
- Around line 159-160: The broad except in the block returning (mef_pid,
entity_pid, {}, f"Error: {err}") should be narrowed — catch
requests.RequestException (for network/HTTP errors from get_online_record) and
any specific parsing errors that get_online_record may raise (e.g., ValueError,
KeyError) instead of Exception; update the except clause(s) around
get_online_record in the function where mef_pid and entity_pid are handled to
handle these specific exceptions and preserve the existing return shape and
error message.
In `@rero_mef/agents/mef/api.py`:
- Line 25: The module-level function get_all_missing_viaf_pids in
rero_mef/agents/api.py appears to be dead code (no call sites) after removing
click/progressbar usage; locate the get_all_missing_viaf_pids definition in that
file and remove it (or move it to a utilities/archive location if you want to
keep it for reference), and run a project-wide search for
get_all_missing_viaf_pids to confirm there are no remaining callers before
committing the removal.
In `@rero_mef/agents/viaf/api.py`:
- Around line 579-585: The unpacked variable action from the call to
self.create_or_update in the viaf API is unused and flagged by Ruff; update the
unpacking to use a throwaway name (e.g., change "new_record, action =
self.create_or_update(...)" to "new_record, _action =
self.create_or_update(...)" or similar) in the same scope so the intent to
discard the second return value is clear and linter warnings are suppressed
while leaving create_or_update unchanged.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 208-237: The viaf_get_record task declares a unused progress
parameter that is not forwarded to _refresh_viaf_record; remove the progress
parameter from the viaf_get_record signature, delete its mention from the
docstring and default args, and update any callers that pass progress to stop
providing it (or adjust callers accordingly); locate the viaf_get_record
function and its docstring to make these edits and ensure behavior remains
unchanged.
In `@rero_mef/cli.py`:
- Around line 1102-1136: Add debug-level logging for exceptions in
rabbitmq_queue_count: create or use a module logger
(logging.getLogger(__name__)) and log the inner exceptions raised by
channel.queue_declare (the inner except Exception) with context (queue_name) and
exc_info=True, and also log the outer broad except Exception that returns 0 with
exc_info=True so broker/connection failures
(current_celery.connection_or_acquire / connection.channel) are recorded for
troubleshooting; keep behavior unchanged otherwise.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 436e8084-d13d-406a-92b9-a9699193aae5
⛔ Files ignored due to path filters (29)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedata/aggnd.jsonis excluded by none and included by nonedata/aggnd_metadata.csvis excluded by!**/*.csvand included by nonedata/aggnd_pidstore.csvis excluded by!**/*.csvand included by nonedata/agrero.jsonis excluded by none and included by nonedata/agrero_metadata.csvis excluded by!**/*.csvand included by nonedata/agrero_pidstore.csvis excluded by!**/*.csvand included by nonedata/aidref.jsonis excluded by none and included by nonedata/aidref_metadata.csvis excluded by!**/*.csvand included by nonedata/aidref_pidstore.csvis excluded by!**/*.csvand included by nonedata/cidref.jsonis excluded by none and included by nonedata/mef.jsonis excluded by none and included by nonedata/mef_id.csvis excluded by!**/*.csvand included by nonedata/mef_metadata.csvis excluded by!**/*.csvand included by nonedata/mef_pidstore.csvis excluded by!**/*.csvand included by nonedata/viaf.jsonis excluded by none and included by nonedata/viaf_metadata.csvis excluded by!**/*.csvand included by nonedata/viaf_pidstore.csvis excluded by!**/*.csvand included by nonedocker-services.ymlis excluded by none and included by nonedocker/uwsgi/uwsgi_rest.iniis excluded by none and included by nonedocker/uwsgi/uwsgi_ui.iniis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyscripts/testis excluded by none and included by nonetests/unit/concepts/examples/xml_minimal_record.xmlis excluded by none and included by nonetests/unit/places/examples/xml_minimal_record.xmlis excluded by none and included by noneuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (27)
rero_mef/agents/api.pyrero_mef/agents/cli.pyrero_mef/agents/mef/api.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/api_mef.pyrero_mef/cli.pyrero_mef/cli_logging.pyrero_mef/marctojson/do_gnd_concepts.pyrero_mef/marctojson/do_gnd_places.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pyrero_mef/places/utils.pyrero_mef/utils.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.pytests/ui/test_api.pytests/ui/test_cli.pytests/ui/test_views.pytests/unit/concepts/test_concepts_gnd_transformation.pytests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
- rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (3)
- tests/ui/test_views.py
- tests/ui/places/test_places_api.py
- tests/unit/places/test_places_gnd_transformation.py
🚧 Files skipped from review as they are similar to previous changes (6)
- rero_mef/api.py
- rero_mef/places/idref/api.py
- rero_mef/monitoring/cli.py
- tests/ui/test_cli.py
- tests/ui/agents/test_agents_cli.py
- rero_mef/agents/api.py
7ec3a80 to
49b4ccb
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
rero_mef/agents/cli.py (1)
177-179:⚠️ Potential issue | 🟠 MajorCompute the VIAF
md5beforecreate_or_update()in--unlinkedmode.
AgentViafRecord.create_or_update()does not addmd5itself. Records created here will missmd5, so the next refresh withtest_md5=Truecannot recognize them as up to date.Suggested fix
+ data = add_md5(data) viaf_record, _ = AgentViafRecord.create_or_update( data=data, dbcommit=True, reindex=True )Also add
add_md5to the..utilsimport list.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 177 - 179, When running in --unlinked mode ensure the VIAF md5 is computed and added to the record before calling AgentViafRecord.create_or_update: call the add_md5 utility (import it from ..utils if not already imported) on the VIAF data payload to set the md5 field, then pass that augmented data into AgentViafRecord.create_or_update(data=..., dbcommit=True, reindex=True) so newly created records include md5 and can be recognized by later test_md5 checks.rero_mef/agents/viaf/tasks.py (1)
27-34:⚠️ Potential issue | 🟠 Major
update_agentsis still a no-op in the normal refresh path.This helper exposes
update_agentsas optional work, but the standard branch always callsAgentViafRecord.create_or_update(), and that method unconditionally runscreate_mef_and_agents(...). Soupdate_agents=Falsestill performs the expensive MEF/agent side effect, which makes the task/CLI flag misleading.Also applies to: 75-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 34, The refresh helper _refresh_viaf_record currently accepts update_agents but still triggers agent/MEF creation because it always calls AgentViafRecord.create_or_update(), which itself unconditionally calls create_mef_and_agents; change the flow so the update_agents flag is respected—either pass update_agents through to AgentViafRecord.create_or_update (and update that method signature to conditionally call create_mef_and_agents) or, if altering create_or_update is undesirable, add a conditional in _refresh_viaf_record to skip the create_mef_and_agents path when update_agents is False; apply the same fix to the other refresh call site noted (the similar create_or_update invocation at the 75-81 region) so the CLI/task flag behavior is not misleading.
🧹 Nitpick comments (2)
tests/unit/concepts/test_concepts_gnd_transformation.py (1)
139-165: Add a negative-path assertion for non-DDC/empty classification behavior.Please add a companion test to confirm non-
DDC686values are ignored and thatclassificationis not set when no valid entry exists. This guards the edge-case branch described by the transformation behavior.Proposed test addition
+def test_gnd_classification_ignores_non_ddc(): + """Test trans_gnd_classification ignores non-DDC entries.""" + xml_part_to_add = """ + <datafield tag="686" ind1=" " ind2=" "> + <subfield code="a">999</subfield> + <subfield code="2">OTHER</subfield> + </datafield> + """ + trans = trans_prep(Transformation, "concepts", xml_part_to_add) + trans.trans_gnd_classification() + assert "classification" not in trans.json🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/concepts/test_concepts_gnd_transformation.py` around lines 139 - 165, Add a negative-path test that uses trans_prep(Transformation, "concepts", xml_part) with 686 datafields that are non-DDC or empty (e.g., <subfield code="2">LCC</subfield> or missing/empty <subfield code="2">) and then call trans.trans_gnd_classification(); assert that trans.json does not contain a "classification" key (or equals {}), and also add a separate case where mixed 686 entries include only non-DDC entries to confirm they are ignored. Use the existing test structure and the same helper symbols (trans_prep, Transformation, trans.trans_gnd_classification) so the new test mirrors the positive case but asserts absence of classification for invalid/non-DDC inputs.rero_mef/marctojson/do_idref_agent.py (1)
416-416: Align logged function name with actual method name.Line 416 logs
trans_authorized_access_point, but the method istrans_idref_authorized_access_point. This can mislead debugging/trace filtering.Suggested patch
- self.logger.info("Call Function: %s", "trans_authorized_access_point") + self.logger.info("Call Function: %s", "trans_idref_authorized_access_point")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/marctojson/do_idref_agent.py` at line 416, The log message currently prints "trans_authorized_access_point" but the actual method is trans_idref_authorized_access_point; update the logger.info call in trans_idref_authorized_access_point to log the correct function name (e.g., "trans_idref_authorized_access_point") so trace messages match the real method, or derive the name dynamically (e.g., using the method's __name__ or f-string with the function identifier) to avoid future drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 504-509: The current VIAF fetch returns a truthy payload for
merged clusters (result contains a different pid and redirect_from), which
causes update_online() to treat it as a normal replacement; change the behavior
so merged-cluster redirects are treated as falsy results: when result.get("pid")
is present (i.e., a redirect), set result["redirect_from"]=pid but return None
(or another falsy payload) instead of the truthy result so update_online() will
invoke handle_redirect() and perform deletion/reconciliation; update any callers
(e.g., update_online(), handle_redirect()) to read the redirect info from the
log/message or from a standardized redirect flag if you choose that approach.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 122-123: The code is materializing the entire scan by calling
list(query.scan()) which loads all VIAF results into memory; instead pass the
scan iterator directly to the progressbar (e.g., use
progressbar(items=query.scan()) or items=iter(query.scan())) or use a
streaming/batched iterator so items are consumed lazily; update the call that
constructs progress_bar (the progressbar(...) invocation that currently wraps
list(query.scan())) to use the iterator returned by query.scan() and ensure any
downstream code supports incremental consumption (respecting batch_size and not
assuming a pre-sized list).
In `@rero_mef/cli.py`:
- Around line 1124-1130: The code that inspects the broker currently swallows
all exceptions and returns 0 (the block around channel.close()/return count
inside the try/except), which makes wait_empty_tasks() treat errors as an empty
queue; change the error handling so exceptions are not collapsed to 0: in the
function that computes the queue count (the try/except that currently returns 0
on any Exception), catch only expected exceptions if needed, log the full
exception (use the existing logger) and either re-raise the exception or return
a sentinel (e.g. None) instead of 0; then update wait_empty_tasks() to detect
that sentinel (or handle exceptions) and retry with backoff or fail with a clear
warning instead of assuming the queue is empty. Ensure channel.close() remains
in the finally block and reference the functions/methods wait_empty_tasks() and
the broker-inspection function in your edits.
In `@rero_mef/marctojson/do_skeleton.py`:
- Line 69: The log call in Transformation.trans_example_2 incorrectly logs
"trans_example_1"; update the logger.info invocation inside the trans_example_2
method to reference the correct function name (e.g., "trans_example_2") so the
message accurately reflects the executing method (locate the logger.info("Call
Function: %s", "trans_example_1") line in trans_example_2 and change the literal
to "trans_example_2").
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 128-145: The tests patch AgentViafSearch.point_in_time but
production code now iterates using query.scan(), so update the mocks to
intercept the actual iterator: patch the scan() method on the
AgentViafSearch.query (e.g.,
mock.patch("rero_mef.agents.viaf.tasks.AgentViafSearch.query.scan",
return_value=iter([hit]), create=True) or create a fake Query object whose scan
returns the iterator) and adjust the ctx/point-in-time mocks accordingly; apply
the same change to the other test block referenced (the one around lines
241-252) so process_viaf_rolling_refresh() is exercised via the scan() iterator
rather than the old point_in_time path.
---
Duplicate comments:
In `@rero_mef/agents/cli.py`:
- Around line 177-179: When running in --unlinked mode ensure the VIAF md5 is
computed and added to the record before calling
AgentViafRecord.create_or_update: call the add_md5 utility (import it from
..utils if not already imported) on the VIAF data payload to set the md5 field,
then pass that augmented data into AgentViafRecord.create_or_update(data=...,
dbcommit=True, reindex=True) so newly created records include md5 and can be
recognized by later test_md5 checks.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 27-34: The refresh helper _refresh_viaf_record currently accepts
update_agents but still triggers agent/MEF creation because it always calls
AgentViafRecord.create_or_update(), which itself unconditionally calls
create_mef_and_agents; change the flow so the update_agents flag is
respected—either pass update_agents through to AgentViafRecord.create_or_update
(and update that method signature to conditionally call create_mef_and_agents)
or, if altering create_or_update is undesirable, add a conditional in
_refresh_viaf_record to skip the create_mef_and_agents path when update_agents
is False; apply the same fix to the other refresh call site noted (the similar
create_or_update invocation at the 75-81 region) so the CLI/task flag behavior
is not misleading.
---
Nitpick comments:
In `@rero_mef/marctojson/do_idref_agent.py`:
- Line 416: The log message currently prints "trans_authorized_access_point" but
the actual method is trans_idref_authorized_access_point; update the logger.info
call in trans_idref_authorized_access_point to log the correct function name
(e.g., "trans_idref_authorized_access_point") so trace messages match the real
method, or derive the name dynamically (e.g., using the method's __name__ or
f-string with the function identifier) to avoid future drift.
In `@tests/unit/concepts/test_concepts_gnd_transformation.py`:
- Around line 139-165: Add a negative-path test that uses
trans_prep(Transformation, "concepts", xml_part) with 686 datafields that are
non-DDC or empty (e.g., <subfield code="2">LCC</subfield> or missing/empty
<subfield code="2">) and then call trans.trans_gnd_classification(); assert that
trans.json does not contain a "classification" key (or equals {}), and also add
a separate case where mixed 686 entries include only non-DDC entries to confirm
they are ignored. Use the existing test structure and the same helper symbols
(trans_prep, Transformation, trans.trans_gnd_classification) so the new test
mirrors the positive case but asserts absence of classification for
invalid/non-DDC inputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4ee2d590-cb2b-4d34-85aa-08c6e6486fab
⛔ Files ignored due to path filters (29)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedata/aggnd.jsonis excluded by none and included by nonedata/aggnd_metadata.csvis excluded by!**/*.csvand included by nonedata/aggnd_pidstore.csvis excluded by!**/*.csvand included by nonedata/agrero.jsonis excluded by none and included by nonedata/agrero_metadata.csvis excluded by!**/*.csvand included by nonedata/agrero_pidstore.csvis excluded by!**/*.csvand included by nonedata/aidref.jsonis excluded by none and included by nonedata/aidref_metadata.csvis excluded by!**/*.csvand included by nonedata/aidref_pidstore.csvis excluded by!**/*.csvand included by nonedata/cidref.jsonis excluded by none and included by nonedata/mef.jsonis excluded by none and included by nonedata/mef_id.csvis excluded by!**/*.csvand included by nonedata/mef_metadata.csvis excluded by!**/*.csvand included by nonedata/mef_pidstore.csvis excluded by!**/*.csvand included by nonedata/viaf.jsonis excluded by none and included by nonedata/viaf_metadata.csvis excluded by!**/*.csvand included by nonedata/viaf_pidstore.csvis excluded by!**/*.csvand included by nonedocker-services.ymlis excluded by none and included by nonedocker/uwsgi/uwsgi_rest.iniis excluded by none and included by nonedocker/uwsgi/uwsgi_ui.iniis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyscripts/testis excluded by none and included by nonetests/unit/concepts/examples/xml_minimal_record.xmlis excluded by none and included by nonetests/unit/places/examples/xml_minimal_record.xmlis excluded by none and included by noneuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (34)
rero_mef/agents/api.pyrero_mef/agents/cli.pyrero_mef/agents/mef/api.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/api_mef.pyrero_mef/cli.pyrero_mef/cli_logging.pyrero_mef/marctojson/do_gnd_agent.pyrero_mef/marctojson/do_gnd_concepts.pyrero_mef/marctojson/do_gnd_places.pyrero_mef/marctojson/do_idref_agent.pyrero_mef/marctojson/do_idref_concepts.pyrero_mef/marctojson/do_idref_places.pyrero_mef/marctojson/do_rero_agent.pyrero_mef/marctojson/do_rero_concepts.pyrero_mef/marctojson/do_skeleton.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pyrero_mef/places/utils.pyrero_mef/utils.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.pytests/ui/test_api.pytests/ui/test_cli.pytests/ui/test_views.pytests/unit/concepts/test_concepts_gnd_transformation.pytests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
- rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (9)
- rero_mef/api.py
- tests/ui/test_views.py
- rero_mef/places/utils.py
- rero_mef/marctojson/do_rero_concepts.py
- rero_mef/marctojson/do_idref_places.py
- tests/unit/places/test_places_gnd_transformation.py
- rero_mef/marctojson/do_rero_agent.py
- tests/ui/places/test_places_gnd_tasks.py
- rero_mef/marctojson/do_gnd_agent.py
🚧 Files skipped from review as they are similar to previous changes (9)
- tests/ui/test_api.py
- tests/ui/places/test_places_api.py
- rero_mef/marctojson/do_gnd_places.py
- rero_mef/cli_logging.py
- rero_mef/marctojson/do_gnd_concepts.py
- rero_mef/places/idref/api.py
- rero_mef/monitoring/cli.py
- tests/ui/test_cli.py
- rero_mef/agents/api.py
49b4ccb to
1266764
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (6)
rero_mef/cli.py (1)
1109-1130:⚠️ Potential issue | 🟠 MajorQueue-inspection failures currently look like "all clear".
Line 1124 and Line 1129 collapse broker errors to
0, sowait_empty_tasks()can exit while RabbitMQ is unavailable or a passive declare failed. Please return a distinct failure state, or raise/log and retry, instead of reusing the empty-queue sentinel.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/cli.py` around lines 1109 - 1130, The code that counts messages (using current_celery, current_celery.connection_or_acquire, channel.queue_declare) is swallowing broker errors and returning 0, which conflates "empty" with "failure"; change the error handling so exceptions from channel.queue_declare or the outer try are not mapped to the empty-queue sentinel: log the exception via the existing logger and re-raise it (or return a distinct failure value like -1) instead of returning 0; ensure channel.close() still runs in the finally block and preserve the existing success path that returns the positive count.rero_mef/agents/viaf/api.py (1)
504-509:⚠️ Potential issue | 🟠 MajorMerged-cluster redirects can still bypass
handle_redirect().This branch returns a normal truthy payload plus
redirect_from.update_online()below only checks truthiness, so it will take the ordinary replace path instead of deleting the old VIAF PID and reconciling MEF links through the redirect flow.Minimal fix
online_data, _ = self.get_online_record(viaf_source_code="VIAF", pid=self.pid) if online_data: + if online_data.get("redirect_from"): + new_record, action, _ = self.handle_redirect( + online_data["pid"], dbcommit=dbcommit, reindex=reindex + ) + return new_record, action online_data["$schema"] = self["$schema"] online_data = add_md5(online_data)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 504 - 509, The current branch inside the VIAF fetch returns a normal truthy payload when a merged-cluster redirect is detected (result.get("pid")), which causes update_online() to treat it as a regular replacement instead of running the redirect flow; change this branch to invoke the redirect handling path instead of returning the normal result: call the existing handle_redirect(...) (or the equivalent redirect helper used elsewhere) with the original pid and the target pid (result["pid"]) and return its result/tuple so update_online() executes the redirect-delete/reconcile logic, or alternatively make the returned value falsy and surface the redirect via a dedicated redirect result flag that update_online() checks; locate this logic around the result, pid, handle_redirect, and update_online symbols and update the branch to route merged-cluster redirects through handle_redirect.rero_mef/agents/viaf/tasks.py (3)
27-33:⚠️ Potential issue | 🟠 Major
update_agentsnever affects this refresh path.The flag is exposed by the CLI and both task entry points, but
_refresh_viaf_record()never uses it. It is not forwarded tohandle_redirect(...)orcreate_or_update(...), so callers get the same behavior regardless ofTrue/False. Either pass it through to the underlying API or remove the option until that behavior exists.Also applies to: 60-65, 76-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 27 - 33, The update_agents parameter on _refresh_viaf_record currently has no effect; update_agents should be forwarded to the downstream handlers (or removed). Modify _refresh_viaf_record to pass update_agents into handle_redirect(...) and create_or_update(...) calls (and any other internal call that applies) so the flag controls whether linked local agents are updated; alternatively, if the feature isn't implemented yet, remove the update_agents parameter from the function signature and its callers (CLI/task entry points) to avoid a no-op flag. Ensure you update all occurrences referenced in this file (including the other two similar regions) so the behavior is consistent.
122-123:⚠️ Potential issue | 🟠 MajorThis still preloads the full VIAF scan before processing the first record.
list(query.scan())makes small batches pay the full O(N) scan and memory cost up front. Evenbatch_size=1will materialize every PID before Line 128 runs. Use a lazy, bounded iterator instead of buffering the whole scan.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 122 - 123, The code currently materializes the entire VIAF scan by calling list(query.scan()) when creating progress_bar; replace that with a lazy iterator to avoid preloading all PIDs — e.g. pass query.scan() (or a generator that yields from query.scan() or an itertools.islice wrapper for bounded consumption) directly into progressbar instead of list(query.scan()), ensuring any required max/total param is set separately so progressbar and the later loop in tasks.py use a streaming iterator rather than buffering the whole result.
180-190:⚠️ Potential issue | 🟠 Major
scan()is not a safe oldest-first iterator for the rolling batch.
process_viaf_rolling_refresh()promises oldest-first processing, butSearch.scan()does not reliably preserve the_updatedsort, and breaking afterbatch_sizecan leave the scroll open. Use PIT/iterate()(or another bounded ordered search) for this path instead ofscan().In elasticsearch-dsl for Python, does Search.scan() preserve an explicit .sort(...) order by default, and what is the recommended way to iterate a sorted batch without leaving an open scroll when stopping early?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 180 - 190, The current loop uses AgentViafSearch().sort(...).scan() which does not guarantee preserving the explicit _updated sort and will leave an open scroll when you break early; change process_viaf_rolling_refresh to iterate a bounded, ordered search instead (e.g. use Search.iterate(...) or open a PIT and use Search.with_pit()/client.open_point_in_time + iterate) so the sort on "_updated" is respected and the search context is closed when you stop at batch_size; ensure the new flow uses AgentViafSearch().sort({"_updated": {"order":"asc"}}) with iterate()/PIT and explicitly close the PIT/iterator when breaking out of the progressbar loop.rero_mef/agents/cli.py (1)
351-352:⚠️ Potential issue | 🟠 Major
--rolling --batch_size 0still does not mean "all".This normalization works for full refresh, but
process_viaf_rolling_refresh()interpretsNoneas "useRERO_MEF_VIAF_REFRESH_BATCH_SIZE". In rolling mode the CLI help says0=all, but the code silently falls back to the default batch instead. Normalize per mode or teach the task to treat0as all.Also applies to: 357-358, 366-367
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/cli.py` around lines 351 - 352, The CLI normalization of batch_size wrongly converts 0 to None globally, which makes --rolling --batch_size 0 fall back to the env default; change the normalization to be mode-aware: when rolling mode is active (the flag used to call process_viaf_rolling_refresh), preserve 0 as the explicit "all" indicator (set batch_arg = 0 if batch_size == 0 else batch_size), while for full refresh keep the existing behavior (batch_arg = None if batch_size == 0 else batch_size) so process_viaf_rolling_refresh sees 0 as "all" and process_viaf_full_refresh still uses None to mean env-default; update the code around batch_arg, batch_size, and the call sites process_viaf_rolling_refresh/process_viaf_full_refresh accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/cli.py`:
- Line 123: The local dict non_existing_pids is never populated so
_clean_non_existing_viaf_links() is never invoked with real data; update the
synchronous and enqueue branches of the function using
task_create_mef_and_agents_from_viaf(...) to return or yield the missing-PID
info and merge it into non_existing_pids before the final cleanup block—i.e.,
change the synchronous path to capture the function's return (e.g., result =
task_create_mef_and_agents_from_viaf(...)) and merge result into
non_existing_pids, and in the enqueue/asynchronous path have the enqueued task
or its callback post back the missing-PID entries into non_existing_pids (or
collect them from the queue response) so that
_clean_non_existing_viaf_links(non_existing_pids) runs with actual data.
- Around line 150-160: The _fetch_viaf function currently catches all Exception
and returns an empty data dict so caller cannot tell lookup misses from real
errors; change _fetch_viaf to only catch expected lookup/miss exceptions (e.g.,
a custom LookupError/RecordNotFound or network-specific exceptions thrown by
AgentViafRecord.get_online_record) and let other exceptions propagate (or return
a distinct error indicator/message instead of {}), so the caller can
differentiate "no VIAF found" from real failures; apply the same change to the
analogous fetch block at lines 169-175 so both use specific exception types or a
distinct error tuple to surface API/logic failures (and preserve existing
verbose logging path when --online_verbose is used).
In `@rero_mef/agents/viaf/api.py`:
- Around line 404-428: The code currently returns the empty-record sentinel ({})
on transient failures (429, TimeoutError, RequestException) which conflates
retryable outages with genuine "no record" results; change the VIAF fetch logic
around _perform_request / the 429 handling and exception handlers so that after
a 429 you wait then retry the request (call _perform_request again once or in a
short retry loop) and on repeated 429/Timeout/RequestException propagate a
distinct retryable failure instead of returning {} (e.g., raise a new
RetryableVIAFError or re-raise the original exception) so callers (including
those using delete_if_not_found) can distinguish transient errors from true
not-found results; keep use of _sleep_with_countdown and ensure the unique
symbols _perform_request, _sleep_with_countdown and the 429 branch are the
places you modify.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 45-54: The current code treats any falsy online_data from
AgentViafRecord.get_online_record(...) as Action.NOT_FOUND; instead
differentiate transient fetch failures from true missing records by inspecting
the returned msg (and/or any additional status flag from get_online_record).
Update the branch that checks if not online_data to: if msg indicates a rate
limit/upstream error (e.g., contains "rate limit", "upstream", "error", or use
an explicit status returned by AgentViafRecord.get_online_record if available),
return Action.NOT_ONLINE; otherwise treat it as a confirmed missing VIAF and
return Action.NOT_FOUND. Ensure you still emit verbose click.echo(msg) when
verbose is true and reference the variables online_data, msg,
AgentViafRecord.get_online_record, Action.NOT_FOUND, and Action.NOT_ONLINE.
In `@rero_mef/cli.py`:
- Around line 1312-1323: After reconciling entity PIDs with
create_or_update_mef(reindex=True) in the loop using get_entity_class(), call
flush_indexes() (or the project’s ES flush helper) before invoking
mef_cls.get_all_pids_without_entities_and_viaf() so Elasticsearch reflects the
recent reindexing; place the flush immediately after the loop (after
reconciled_count updates) and before orphan_pids is computed to prevent deleting
just-relinked documents.
In `@tests/ui/agents/test_agents_viaf_tasks.py`:
- Around line 61-72: The test mutates Document["@about"] which is ignored by
AgentViafRecord.get_online_record(), so change the fixture to mutate a
normalized field that is persisted (e.g., modify modified_data["sources"] or
modified_data["xLinks"] or another value that maps to AgentViafRecord's stored
fields) so the MD5 differs and the refresh path triggers UPDATE/CREATE; update
the block around modified_data in tests/ui/agents/test_agents_viaf_tasks.py (the
modified_data variable used with _refresh_viaf_record and
agent_viaf_online_response) to change one of the persisted fields (sources or
xLinks) instead of Document["@about"] so the assertion becomes reliable.
---
Duplicate comments:
In `@rero_mef/agents/cli.py`:
- Around line 351-352: The CLI normalization of batch_size wrongly converts 0 to
None globally, which makes --rolling --batch_size 0 fall back to the env
default; change the normalization to be mode-aware: when rolling mode is active
(the flag used to call process_viaf_rolling_refresh), preserve 0 as the explicit
"all" indicator (set batch_arg = 0 if batch_size == 0 else batch_size), while
for full refresh keep the existing behavior (batch_arg = None if batch_size == 0
else batch_size) so process_viaf_rolling_refresh sees 0 as "all" and
process_viaf_full_refresh still uses None to mean env-default; update the code
around batch_arg, batch_size, and the call sites
process_viaf_rolling_refresh/process_viaf_full_refresh accordingly.
In `@rero_mef/agents/viaf/api.py`:
- Around line 504-509: The current branch inside the VIAF fetch returns a normal
truthy payload when a merged-cluster redirect is detected (result.get("pid")),
which causes update_online() to treat it as a regular replacement instead of
running the redirect flow; change this branch to invoke the redirect handling
path instead of returning the normal result: call the existing
handle_redirect(...) (or the equivalent redirect helper used elsewhere) with the
original pid and the target pid (result["pid"]) and return its result/tuple so
update_online() executes the redirect-delete/reconcile logic, or alternatively
make the returned value falsy and surface the redirect via a dedicated redirect
result flag that update_online() checks; locate this logic around the result,
pid, handle_redirect, and update_online symbols and update the branch to route
merged-cluster redirects through handle_redirect.
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 27-33: The update_agents parameter on _refresh_viaf_record
currently has no effect; update_agents should be forwarded to the downstream
handlers (or removed). Modify _refresh_viaf_record to pass update_agents into
handle_redirect(...) and create_or_update(...) calls (and any other internal
call that applies) so the flag controls whether linked local agents are updated;
alternatively, if the feature isn't implemented yet, remove the update_agents
parameter from the function signature and its callers (CLI/task entry points) to
avoid a no-op flag. Ensure you update all occurrences referenced in this file
(including the other two similar regions) so the behavior is consistent.
- Around line 122-123: The code currently materializes the entire VIAF scan by
calling list(query.scan()) when creating progress_bar; replace that with a lazy
iterator to avoid preloading all PIDs — e.g. pass query.scan() (or a generator
that yields from query.scan() or an itertools.islice wrapper for bounded
consumption) directly into progressbar instead of list(query.scan()), ensuring
any required max/total param is set separately so progressbar and the later loop
in tasks.py use a streaming iterator rather than buffering the whole result.
- Around line 180-190: The current loop uses AgentViafSearch().sort(...).scan()
which does not guarantee preserving the explicit _updated sort and will leave an
open scroll when you break early; change process_viaf_rolling_refresh to iterate
a bounded, ordered search instead (e.g. use Search.iterate(...) or open a PIT
and use Search.with_pit()/client.open_point_in_time + iterate) so the sort on
"_updated" is respected and the search context is closed when you stop at
batch_size; ensure the new flow uses AgentViafSearch().sort({"_updated":
{"order":"asc"}}) with iterate()/PIT and explicitly close the PIT/iterator when
breaking out of the progressbar loop.
In `@rero_mef/cli.py`:
- Around line 1109-1130: The code that counts messages (using current_celery,
current_celery.connection_or_acquire, channel.queue_declare) is swallowing
broker errors and returning 0, which conflates "empty" with "failure"; change
the error handling so exceptions from channel.queue_declare or the outer try are
not mapped to the empty-queue sentinel: log the exception via the existing
logger and re-raise it (or return a distinct failure value like -1) instead of
returning 0; ensure channel.close() still runs in the finally block and preserve
the existing success path that returns the positive count.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e95e5e0c-1a25-4e29-aa20-896ad89ff765
⛔ Files ignored due to path filters (27)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedata/aggnd.jsonis excluded by none and included by nonedata/aggnd_metadata.csvis excluded by!**/*.csvand included by nonedata/aggnd_pidstore.csvis excluded by!**/*.csvand included by nonedata/agrero.jsonis excluded by none and included by nonedata/agrero_metadata.csvis excluded by!**/*.csvand included by nonedata/agrero_pidstore.csvis excluded by!**/*.csvand included by nonedata/aidref.jsonis excluded by none and included by nonedata/aidref_metadata.csvis excluded by!**/*.csvand included by nonedata/aidref_pidstore.csvis excluded by!**/*.csvand included by nonedata/cidref.jsonis excluded by none and included by nonedata/mef.jsonis excluded by none and included by nonedata/mef_id.csvis excluded by!**/*.csvand included by nonedata/mef_metadata.csvis excluded by!**/*.csvand included by nonedata/mef_pidstore.csvis excluded by!**/*.csvand included by nonedata/viaf.jsonis excluded by none and included by nonedata/viaf_metadata.csvis excluded by!**/*.csvand included by nonedata/viaf_pidstore.csvis excluded by!**/*.csvand included by nonedocker-services.ymlis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyscripts/testis excluded by none and included by nonetests/unit/concepts/examples/xml_minimal_record.xmlis excluded by none and included by nonetests/unit/places/examples/xml_minimal_record.xmlis excluded by none and included by noneuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (34)
rero_mef/agents/api.pyrero_mef/agents/cli.pyrero_mef/agents/mef/api.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/api_mef.pyrero_mef/cli.pyrero_mef/cli_logging.pyrero_mef/marctojson/do_gnd_agent.pyrero_mef/marctojson/do_gnd_concepts.pyrero_mef/marctojson/do_gnd_places.pyrero_mef/marctojson/do_idref_agent.pyrero_mef/marctojson/do_idref_concepts.pyrero_mef/marctojson/do_idref_places.pyrero_mef/marctojson/do_rero_agent.pyrero_mef/marctojson/do_rero_concepts.pyrero_mef/marctojson/do_skeleton.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pyrero_mef/places/utils.pyrero_mef/utils.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.pytests/ui/test_api.pytests/ui/test_cli.pytests/ui/test_views.pytests/unit/concepts/test_concepts_gnd_transformation.pytests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
- rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (13)
- tests/ui/test_views.py
- rero_mef/marctojson/do_idref_places.py
- rero_mef/marctojson/do_rero_concepts.py
- rero_mef/places/utils.py
- rero_mef/marctojson/do_idref_agent.py
- rero_mef/api.py
- tests/unit/places/test_places_gnd_transformation.py
- rero_mef/marctojson/do_rero_agent.py
- rero_mef/marctojson/do_idref_concepts.py
- rero_mef/places/idref/api.py
- tests/ui/places/test_places_api.py
- rero_mef/marctojson/do_gnd_agent.py
- rero_mef/agents/api.py
🚧 Files skipped from review as they are similar to previous changes (6)
- rero_mef/marctojson/do_skeleton.py
- tests/ui/test_api.py
- rero_mef/marctojson/do_gnd_places.py
- rero_mef/marctojson/do_gnd_concepts.py
- tests/ui/agents/test_agents_cli.py
- rero_mef/monitoring/cli.py
1266764 to
386c6eb
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
rero_mef/agents/viaf/api.py (2)
637-658:⚠️ Potential issue | 🔴 CriticalValidate the redirect target before deleting the source record.
Line 656 is only protected against exceptions. If
create_or_update()returns no record on a handled failure, the old VIAF record is still deleted and Line 658 reportsAction.REDIRECT.🛠️ Suggested fix
try: new_record, action = self.create_or_update( data=target_data, dbcommit=dbcommit, reindex=reindex, test_md5=True, ) except Exception as e: current_app.logger.exception( f"Failed to create/update target VIAF {redirect_to_pid}: {e}" ) if delete_if_not_found: current_app.logger.info( f"Deleting old VIAF record {old_pid} due to target creation error" ) self.delete(force=True, dbcommit=dbcommit, delindex=reindex) return None, Action.ERROR, redirect_info + + if not new_record: + current_app.logger.error( + f"Failed to create/update target VIAF {redirect_to_pid}: {action}" + ) + return None, Action.ERROR, redirect_info # Delete old VIAF record (handles MEF cleanup) only after successful create/update self.delete(force=True, dbcommit=dbcommit, delindex=reindex)As per coding guidelines, focus on “Security vulnerabilities and correctness issues” and “Logic errors and architectural improvements.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 637 - 658, After calling create_or_update (in the block that sets new_record, action), validate that new_record is not None and that action indicates a successful creation/update (not Action.ERROR) before calling self.delete or returning Action.REDIRECT; if create_or_update returned no record or an error, respect delete_if_not_found (log and delete only when configured) and return Action.ERROR or appropriate redirect_info instead of deleting the source unconditionally. Update the logic around new_record/action checks in the same function to only perform self.delete(force=True, dbcommit=dbcommit, delindex=reindex) and return Action.REDIRECT when the target creation actually succeeded.
460-545:⚠️ Potential issue | 🟠 MajorKeep non-404 responses out of the “not found” path.
Line 545 currently collapses any non-200 response that reaches this block into the same
{}/NO RECORDsentinel used for a genuine miss.update_online()andhandle_redirect()both branch on falsy payloads, so a backend failure can be treated as a real disappearance, and withdelete_if_not_found=Truethe redirect path can even remove a valid VIAF record. Reserve the empty sentinel for 404 / empty-success cases only, and raiseRetryableVIAFErrorfor the rest.As per coding guidelines, focus on “Security vulnerabilities and correctness issues” and “Logic errors and architectural improvements.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 460 - 545, The code currently treats any non-200 response as a NO RECORD (empty dict) which can cause backend failures to be handled like genuine misses; modify the response handling in the VIAF fetch routine (the block using response, pid, url and setting msg) so that: if response.status_code == requests.codes.ok continue parsing as now; if response.status_code == requests.codes.not_found (404) return {} and the NO RECORD msg as before; otherwise raise RetryableVIAFError with a clear message (include pid/url/status) so callers like update_online() and handle_redirect() will retry instead of deleting/redirecting a valid record. Ensure the raised exception uses the existing RetryableVIAFError symbol and do not change downstream logic that expects an empty dict only for true 404/empty-success cases.
🧹 Nitpick comments (3)
tests/unit/places/test_places_gnd_transformation.py (1)
133-149: Add a non-DDC case to prevent false DDC classification coverage.Line 136 includes
$2=DDC, but the test only validates the happy path. Givenrero_mef/marctojson/do_gnd_places.pycurrently buildsbf:ClassificationDdcfrom686$awithout checking686$2, this test won’t catch non-DDC misclassification regressions.Suggested test extension
+def test_gnd_classification_ignores_non_ddc(): + """Test trans_gnd_classification ignores non-DDC 686.""" + xml_part_to_add = """ + <datafield tag="686" ind1=" " ind2=" "> + <subfield code="a">360</subfield> + <subfield code="c">Geografie. Anthrogeografie</subfield> + <subfield code="2">LCC</subfield> + </datafield> + """ + trans = trans_prep(Transformation, "places", xml_part_to_add) + trans.trans_gnd_classification() + assert "classification" not in trans.json🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/places/test_places_gnd_transformation.py` around lines 133 - 149, Update the unit test to include a non-DDC case so trans.trans_gnd_classification() does not falsely create a bf:ClassificationDdc when $2 is not "DDC": in test_places_gnd_transformation.py, add another xml_part (or modify the existing one) where the datafield tag="686" contains subfield code="a" but subfield code="2" is a value other than "DDC" and assert that Transformation (created via trans_prep) either does not add a classification entry or adds the appropriate non-DDC classification; also run the test to ensure the logic in rero_mef/marctojson/do_gnd_places.py (the code that builds bf:ClassificationDdc from 686$a) will be caught if it ignores 686$2.tests/ui/test_cli.py (2)
212-234: Tighten queue inspection assertions to verify passive declarations are used.The return-value assertion is good, but adding assertions on
queue_declarecall semantics would better lock in intended broker interaction.Proposed assertion strengthening
with mock.patch("rero_mef.cli.current_celery", fake_celery): assert rabbitmq_queue_count() == 5 + assert channel.queue_declare.call_count == len(queues) + assert all( + c.kwargs.get("passive") is True for c in channel.queue_declare.call_args_list + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/test_cli.py` around lines 212 - 234, The test_rabbitmq_queue_count currently only asserts the returned total; enhance it to assert that channel.queue_declare was called for each queue using passive declarations so the broker is only inspected and not modified: after setting up fake_celery and calling rabbitmq_queue_count(), assert channel.queue_declare.call_count == len(queues) and assert that channel.queue_declare was called with the expected queue names (from fake_celery.amqp.queues keys) and with passive=True (or the passive argument used by rabbitmq_queue_count), and verify the call order if needed; locate the assertions around test_rabbitmq_queue_count, channel.queue_declare, fake_celery.connection_or_acquire and rabbitmq_queue_count to implement these checks.
236-246: Prefer exact sleep-sequence assertion over count-only check.
call_count == 3can pass despite wrong delays/order; asserting exact calls makes this retry/wait contract much more robust.Proposed assertion strengthening
- assert mocked_sleep.call_count == 3 + assert mocked_sleep.call_args_list == [mock.call(5), mock.call(0), mock.call(5)]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/test_cli.py` around lines 236 - 246, Replace the loose sleep-count assertion in test_wait_empty_tasks_checks_rabbitmq with an exact sequence assertion so the retry/wait order and arguments are validated: instead of asserting mocked_sleep.call_count == 3, assert that mocked_sleep was called three times with the expected delay argument in the correct order (matching the side effects on queue_count and rabbitmq_queue_count); use mocked_sleep.call_args_list and compare to the expected list of call entries (e.g., three calls with delay=0) to lock down behavior of wait_empty_tasks and its interactions with rabbitmq_queue_count and queue_count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/monitoring/cli.py`:
- Around line 168-170: In both db_connection_counts and db_connections replace
the broad except blocks that call click.secho(...) and return with raising a
ClickException so the process exits non‑zero; specifically, catch Exception as
error and replace the click.secho + return sequence with raise
click.ClickException(str(error)) so database errors print to stderr and set a
non‑zero exit code.
---
Duplicate comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 637-658: After calling create_or_update (in the block that sets
new_record, action), validate that new_record is not None and that action
indicates a successful creation/update (not Action.ERROR) before calling
self.delete or returning Action.REDIRECT; if create_or_update returned no record
or an error, respect delete_if_not_found (log and delete only when configured)
and return Action.ERROR or appropriate redirect_info instead of deleting the
source unconditionally. Update the logic around new_record/action checks in the
same function to only perform self.delete(force=True, dbcommit=dbcommit,
delindex=reindex) and return Action.REDIRECT when the target creation actually
succeeded.
- Around line 460-545: The code currently treats any non-200 response as a NO
RECORD (empty dict) which can cause backend failures to be handled like genuine
misses; modify the response handling in the VIAF fetch routine (the block using
response, pid, url and setting msg) so that: if response.status_code ==
requests.codes.ok continue parsing as now; if response.status_code ==
requests.codes.not_found (404) return {} and the NO RECORD msg as before;
otherwise raise RetryableVIAFError with a clear message (include pid/url/status)
so callers like update_online() and handle_redirect() will retry instead of
deleting/redirecting a valid record. Ensure the raised exception uses the
existing RetryableVIAFError symbol and do not change downstream logic that
expects an empty dict only for true 404/empty-success cases.
---
Nitpick comments:
In `@tests/ui/test_cli.py`:
- Around line 212-234: The test_rabbitmq_queue_count currently only asserts the
returned total; enhance it to assert that channel.queue_declare was called for
each queue using passive declarations so the broker is only inspected and not
modified: after setting up fake_celery and calling rabbitmq_queue_count(),
assert channel.queue_declare.call_count == len(queues) and assert that
channel.queue_declare was called with the expected queue names (from
fake_celery.amqp.queues keys) and with passive=True (or the passive argument
used by rabbitmq_queue_count), and verify the call order if needed; locate the
assertions around test_rabbitmq_queue_count, channel.queue_declare,
fake_celery.connection_or_acquire and rabbitmq_queue_count to implement these
checks.
- Around line 236-246: Replace the loose sleep-count assertion in
test_wait_empty_tasks_checks_rabbitmq with an exact sequence assertion so the
retry/wait order and arguments are validated: instead of asserting
mocked_sleep.call_count == 3, assert that mocked_sleep was called three times
with the expected delay argument in the correct order (matching the side effects
on queue_count and rabbitmq_queue_count); use mocked_sleep.call_args_list and
compare to the expected list of call entries (e.g., three calls with delay=0) to
lock down behavior of wait_empty_tasks and its interactions with
rabbitmq_queue_count and queue_count.
In `@tests/unit/places/test_places_gnd_transformation.py`:
- Around line 133-149: Update the unit test to include a non-DDC case so
trans.trans_gnd_classification() does not falsely create a bf:ClassificationDdc
when $2 is not "DDC": in test_places_gnd_transformation.py, add another xml_part
(or modify the existing one) where the datafield tag="686" contains subfield
code="a" but subfield code="2" is a value other than "DDC" and assert that
Transformation (created via trans_prep) either does not add a classification
entry or adds the appropriate non-DDC classification; also run the test to
ensure the logic in rero_mef/marctojson/do_gnd_places.py (the code that builds
bf:ClassificationDdc from 686$a) will be caught if it ignores 686$2.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 06e302c3-ec57-45c7-88ba-2fdd1862e4c1
⛔ Files ignored due to path filters (25)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedata/aggnd.jsonis excluded by none and included by nonedata/aggnd_metadata.csvis excluded by!**/*.csvand included by nonedata/aggnd_pidstore.csvis excluded by!**/*.csvand included by nonedata/agrero.jsonis excluded by none and included by nonedata/agrero_metadata.csvis excluded by!**/*.csvand included by nonedata/agrero_pidstore.csvis excluded by!**/*.csvand included by nonedata/aidref.jsonis excluded by none and included by nonedata/aidref_metadata.csvis excluded by!**/*.csvand included by nonedata/aidref_pidstore.csvis excluded by!**/*.csvand included by nonedata/cidref.jsonis excluded by none and included by nonedata/mef.jsonis excluded by none and included by nonedata/mef_id.csvis excluded by!**/*.csvand included by nonedata/mef_metadata.csvis excluded by!**/*.csvand included by nonedata/mef_pidstore.csvis excluded by!**/*.csvand included by nonedata/viaf.jsonis excluded by none and included by nonedata/viaf_metadata.csvis excluded by!**/*.csvand included by nonedata/viaf_pidstore.csvis excluded by!**/*.csvand included by nonedocker-services.ymlis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyscripts/testis excluded by none and included by noneuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (35)
rero_mef/agents/api.pyrero_mef/agents/cli.pyrero_mef/agents/mef/api.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/api_mef.pyrero_mef/cli.pyrero_mef/cli_logging.pyrero_mef/marctojson/do_gnd_agent.pyrero_mef/marctojson/do_gnd_concepts.pyrero_mef/marctojson/do_gnd_places.pyrero_mef/marctojson/do_idref_agent.pyrero_mef/marctojson/do_idref_concepts.pyrero_mef/marctojson/do_idref_places.pyrero_mef/marctojson/do_rero_agent.pyrero_mef/marctojson/do_rero_concepts.pyrero_mef/marctojson/do_skeleton.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pyrero_mef/places/utils.pyrero_mef/utils.pytests/ui/agents/test_agents_api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.pytests/ui/test_api.pytests/ui/test_cli.pytests/ui/test_views.pytests/unit/concepts/test_concepts_gnd_transformation.pytests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
- rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (13)
- tests/ui/test_views.py
- tests/ui/places/test_places_api.py
- rero_mef/marctojson/do_idref_places.py
- rero_mef/places/utils.py
- rero_mef/marctojson/do_idref_concepts.py
- rero_mef/marctojson/do_idref_agent.py
- rero_mef/marctojson/do_skeleton.py
- rero_mef/marctojson/do_rero_concepts.py
- rero_mef/marctojson/do_rero_agent.py
- rero_mef/marctojson/do_gnd_agent.py
- rero_mef/api.py
- tests/ui/places/test_places_gnd_tasks.py
- tests/ui/agents/test_agents_cli.py
🚧 Files skipped from review as they are similar to previous changes (12)
- tests/ui/test_api.py
- rero_mef/cli_logging.py
- tests/unit/concepts/test_concepts_gnd_transformation.py
- rero_mef/agents/mef/api.py
- rero_mef/marctojson/do_gnd_places.py
- rero_mef/places/idref/api.py
- rero_mef/marctojson/do_gnd_concepts.py
- rero_mef/monitoring/views.py
- tests/ui/agents/test_agents_viaf_tasks.py
- rero_mef/cli.py
- rero_mef/agents/viaf/tasks.py
- rero_mef/agents/cli.py
Pull-based VIAF update using the JSON API since the OAI-PMH endpoint is non-functional. Adds CLI command harvest-viaf with rolling, missing and full refresh modes. Detects and handles VIAF cluster merges (redirects) with automatic MEF cleanup. Co-Authored-by: Peter Weber <peter.weber@rero.ch>
386c6eb to
afd7155
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
rero_mef/marctojson/do_gnd_concepts.py (1)
164-164: Correct the logged function name intrans_gnd_relation_pid.Line 164 logs
"trans_gnd_relation", which is misleading when tracing execution oftrans_gnd_relation_pid.Suggested patch
- self.logger.info("Call Function: %s", "trans_gnd_relation") + self.logger.info("Call Function: %s", "trans_gnd_relation_pid")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/marctojson/do_gnd_concepts.py` at line 164, The logger call inside trans_gnd_relation_pid currently logs "trans_gnd_relation" which is misleading; update the logging statement in trans_gnd_relation_pid to call self.logger.info("Call Function: %s", "trans_gnd_relation_pid") so the function name matches the actual function (look for the existing self.logger.info line within trans_gnd_relation_pid and replace the string literal).rero_mef/agents/viaf/api.py (1)
637-643: Unusedactionvariable (Ruff RUF059).The
actionreturned bycreate_or_updateis intentionally discarded (sincehandle_redirectreturnsAction.REDIRECTon success). Prefix with underscore to indicate intentional discard.🔧 Suggested fix
- new_record, action = self.create_or_update( + new_record, _action = self.create_or_update( data=target_data, dbcommit=dbcommit, reindex=reindex, test_md5=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/api.py` around lines 637 - 643, The local variable "action" from the create_or_update call in handle_redirect is unused and triggers Ruff RUF059; rename it to "_action" (or prefix with underscore) to mark it intentionally discarded where create_or_update(...) is called so the returned tuple is still unpacked but the linter knows the value is ignored (reference create_or_update and handle_redirect/Action.REDIRECT).rero_mef/agents/viaf/tasks.py (1)
38-48: Theupdate_agentsparameter is declared but never used.The parameter is documented and passed through to
_refresh_viaf_record, but it's never actually used within the function. SinceAgentViafRecord.create_or_updatealready callscreate_mef_and_agentsinternally, this parameter appears to be leftover from previous iterations.Consider removing the
update_agentsparameter from_refresh_viaf_recordand the Celery tasks if it's not needed, or document why it's reserved for future use.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/agents/viaf/tasks.py` around lines 38 - 48, The update_agents parameter on _refresh_viaf_record (and the Celery task wrappers) is declared and passed but unused—remove it to avoid dead API surface: drop update_agents from the _refresh_viaf_record function signature and from any Celery task signatures/calls that forward it, update the docstring to remove the parameter, and remove any corresponding arguments in callers; if you prefer to keep it for backward compatibility, instead mark it as deprecated/unused in the docstring and add a noop comment that references AgentViafRecord.create_or_update and its create_mef_and_agents behavior so future readers know why it’s unused. Ensure tests and any call sites are updated accordingly.rero_mef/cli.py (1)
1354-1369: Verify flush placement relative to orphan scan.The
flush_indexes()call at line 1355 occurs inside thefor pid_type, entity_maploop, so it flushes after each entity type's reconciliation. Line 1356 (orphan_pids = ...) is outside this inner loop but still inside thefor group_name, mef_cls, group_types in plans:loop. This means orphans are scanned per-group after that group's reconciliation, which appears correct.However, ensure that
mef_cls.flush_indexes()at line 1355 is only called when there were actual reconciliations (whenentity_mapwas non-empty). Currently it's guarded byif not dry_run:at line 1354, which would still run even if no reconciliations happened for thatpid_type.🔧 Suggested fix to only flush when needed
- if not dry_run: - mef_cls.flush_indexes() + if not dry_run and reconciled_count > 0: + mef_cls.flush_indexes()Move the flush outside the
for pid_typeloop and gate it on whether any reconciliations occurred.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/cli.py` around lines 1354 - 1369, The flush of search/indexes (mef_cls.flush_indexes()) is being called unconditionally for each pid_type when not dry_run, even if no reconciliations happened; change the logic to track whether any reconciliations occurred for the current group (e.g. set a boolean like reconciled=True when any entity_map is non-empty or any record was processed inside the for pid_type, entity_map loop) and move the call to mef_cls.flush_indexes() so it runs once per group (after the pid_type loop) only if reconciled is True and not dry_run; this ensures flush_indexes() is only invoked when there were actual changes before you call get_all_pids_without_entities_and_viaf() and scan for orphans.tests/ui/agents/test_agents_viaf_api.py (1)
519-524: Unusedredirect_infovariable (Ruff RUF059).The unpacked variable
redirect_infois never used in this test. Prefix it with an underscore to indicate intentional discard.🔧 Suggested fix
# Call handle_redirect - should return ERROR for chained redirect - new_record, action, redirect_info = agent_viaf_record.handle_redirect( + new_record, action, _redirect_info = agent_viaf_record.handle_redirect( redirect_to_pid=new_pid, dbcommit=True, reindex=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ui/agents/test_agents_viaf_api.py` around lines 519 - 524, The test unpacks three values from agent_viaf_record.handle_redirect but never uses the third (redirect_info), triggering Ruff RUF059; change the unpacked name from redirect_info to _redirect_info (or simply _) in the assignment where handle_redirect is called to mark it as intentionally unused (the call to agent_viaf_record.handle_redirect with redirect_to_pid=new_pid, dbcommit=True, reindex=True).rero_mef/monitoring/cli.py (1)
30-33: Consider adding@with_appcontextto the group for complete logging normalization.
monitoring()is a plain group callback without@with_appcontext, so it executes without app context. Whileensure_single_stream_handler()is designed to handle this gracefully (it always normalizes the named logger and catchesRuntimeErrorwhen accessingcurrent_app), the app logger normalization andpropagate=Falsesetting in lines 34-37 ofrero_mef/cli_logging.pywill be skipped. Adding@with_appcontextto the group would ensure complete normalization at the group level.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rero_mef/monitoring/cli.py` around lines 30 - 33, The monitoring click group is defined without Flask's application context, so calling ensure_single_stream_handler() inside monitoring() may skip the app logger normalization (including propagate=False) implemented in cli_logging.py; add the `@with_appcontext` decorator to the monitoring() group so it runs with an app context and ensure_single_stream_handler() can perform full normalization of the app logger (refer to the monitoring() function and ensure_single_stream_handler() call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rero_mef/agents/cli.py`:
- Around line 317-336: The CLI defines the short flag "-p" twice (for the
progress option and the viaf_pid option), so change one of the short flags to a
unique letter to avoid collision; for example, update the click.option for
"viaf_pid" (symbol viaf_pid) to use a different short flag such as "-i" or
remove the short flag entirely, or alternatively change the progress option's
short flag, ensuring the click.option declarations for "progress" and "viaf_pid"
no longer share "-p".
---
Nitpick comments:
In `@rero_mef/agents/viaf/api.py`:
- Around line 637-643: The local variable "action" from the create_or_update
call in handle_redirect is unused and triggers Ruff RUF059; rename it to
"_action" (or prefix with underscore) to mark it intentionally discarded where
create_or_update(...) is called so the returned tuple is still unpacked but the
linter knows the value is ignored (reference create_or_update and
handle_redirect/Action.REDIRECT).
In `@rero_mef/agents/viaf/tasks.py`:
- Around line 38-48: The update_agents parameter on _refresh_viaf_record (and
the Celery task wrappers) is declared and passed but unused—remove it to avoid
dead API surface: drop update_agents from the _refresh_viaf_record function
signature and from any Celery task signatures/calls that forward it, update the
docstring to remove the parameter, and remove any corresponding arguments in
callers; if you prefer to keep it for backward compatibility, instead mark it as
deprecated/unused in the docstring and add a noop comment that references
AgentViafRecord.create_or_update and its create_mef_and_agents behavior so
future readers know why it’s unused. Ensure tests and any call sites are updated
accordingly.
In `@rero_mef/cli.py`:
- Around line 1354-1369: The flush of search/indexes (mef_cls.flush_indexes())
is being called unconditionally for each pid_type when not dry_run, even if no
reconciliations happened; change the logic to track whether any reconciliations
occurred for the current group (e.g. set a boolean like reconciled=True when any
entity_map is non-empty or any record was processed inside the for pid_type,
entity_map loop) and move the call to mef_cls.flush_indexes() so it runs once
per group (after the pid_type loop) only if reconciled is True and not dry_run;
this ensures flush_indexes() is only invoked when there were actual changes
before you call get_all_pids_without_entities_and_viaf() and scan for orphans.
In `@rero_mef/marctojson/do_gnd_concepts.py`:
- Line 164: The logger call inside trans_gnd_relation_pid currently logs
"trans_gnd_relation" which is misleading; update the logging statement in
trans_gnd_relation_pid to call self.logger.info("Call Function: %s",
"trans_gnd_relation_pid") so the function name matches the actual function (look
for the existing self.logger.info line within trans_gnd_relation_pid and replace
the string literal).
In `@rero_mef/monitoring/cli.py`:
- Around line 30-33: The monitoring click group is defined without Flask's
application context, so calling ensure_single_stream_handler() inside
monitoring() may skip the app logger normalization (including propagate=False)
implemented in cli_logging.py; add the `@with_appcontext` decorator to the
monitoring() group so it runs with an app context and
ensure_single_stream_handler() can perform full normalization of the app logger
(refer to the monitoring() function and ensure_single_stream_handler() call).
In `@tests/ui/agents/test_agents_viaf_api.py`:
- Around line 519-524: The test unpacks three values from
agent_viaf_record.handle_redirect but never uses the third (redirect_info),
triggering Ruff RUF059; change the unpacked name from redirect_info to
_redirect_info (or simply _) in the assignment where handle_redirect is called
to mark it as intentionally unused (the call to
agent_viaf_record.handle_redirect with redirect_to_pid=new_pid, dbcommit=True,
reindex=True).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 72a17809-c3c8-41c6-97bb-755ebc5d4d77
⛔ Files ignored due to path filters (25)
.github/AI_SETUP.mdis excluded by none and included by none.github/copilot-instructions.mdis excluded by none and included by none.github/workflows/continuous-integration-test.ymlis excluded by none and included by noneCLAUDE.mdis excluded by none and included by nonedata/aggnd.jsonis excluded by none and included by nonedata/aggnd_metadata.csvis excluded by!**/*.csvand included by nonedata/aggnd_pidstore.csvis excluded by!**/*.csvand included by nonedata/agrero.jsonis excluded by none and included by nonedata/agrero_metadata.csvis excluded by!**/*.csvand included by nonedata/agrero_pidstore.csvis excluded by!**/*.csvand included by nonedata/aidref.jsonis excluded by none and included by nonedata/aidref_metadata.csvis excluded by!**/*.csvand included by nonedata/aidref_pidstore.csvis excluded by!**/*.csvand included by nonedata/cidref.jsonis excluded by none and included by nonedata/mef.jsonis excluded by none and included by nonedata/mef_id.csvis excluded by!**/*.csvand included by nonedata/mef_metadata.csvis excluded by!**/*.csvand included by nonedata/mef_pidstore.csvis excluded by!**/*.csvand included by nonedata/viaf.jsonis excluded by none and included by nonedata/viaf_metadata.csvis excluded by!**/*.csvand included by nonedata/viaf_pidstore.csvis excluded by!**/*.csvand included by nonedocker-services.ymlis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyscripts/testis excluded by none and included by noneuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (36)
rero_mef/agents/api.pyrero_mef/agents/cli.pyrero_mef/agents/mef/api.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/tasks.pyrero_mef/api.pyrero_mef/api_mef.pyrero_mef/cli.pyrero_mef/cli_logging.pyrero_mef/marctojson/do_gnd_agent.pyrero_mef/marctojson/do_gnd_concepts.pyrero_mef/marctojson/do_gnd_places.pyrero_mef/marctojson/do_idref_agent.pyrero_mef/marctojson/do_idref_concepts.pyrero_mef/marctojson/do_idref_places.pyrero_mef/marctojson/do_rero_agent.pyrero_mef/marctojson/do_rero_concepts.pyrero_mef/marctojson/do_skeleton.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/views.pyrero_mef/places/idref/api.pyrero_mef/places/utils.pyrero_mef/utils.pytests/ui/agents/test_agents_api.pytests/ui/agents/test_agents_cli.pytests/ui/agents/test_agents_viaf_api.pytests/ui/agents/test_agents_viaf_tasks.pytests/ui/places/test_places_api.pytests/ui/places/test_places_gnd_tasks.pytests/ui/test_api.pytests/ui/test_cli.pytests/ui/test_monitoring.pytests/ui/test_views.pytests/unit/concepts/test_concepts_gnd_transformation.pytests/unit/places/test_places_gnd_transformation.py
💤 Files with no reviewable changes (1)
- rero_mef/api_mef.py
✅ Files skipped from review due to trivial changes (10)
- rero_mef/api.py
- tests/ui/test_views.py
- rero_mef/marctojson/do_skeleton.py
- rero_mef/places/utils.py
- rero_mef/marctojson/do_idref_places.py
- rero_mef/marctojson/do_idref_concepts.py
- rero_mef/marctojson/do_rero_concepts.py
- rero_mef/marctojson/do_gnd_agent.py
- tests/ui/places/test_places_gnd_tasks.py
- rero_mef/marctojson/do_rero_agent.py
🚧 Files skipped from review as they are similar to previous changes (14)
- tests/ui/test_api.py
- rero_mef/marctojson/do_idref_agent.py
- tests/unit/concepts/test_concepts_gnd_transformation.py
- tests/unit/places/test_places_gnd_transformation.py
- rero_mef/cli_logging.py
- rero_mef/agents/mef/api.py
- rero_mef/marctojson/do_gnd_places.py
- rero_mef/monitoring/views.py
- tests/ui/agents/test_agents_api.py
- rero_mef/places/idref/api.py
- rero_mef/agents/api.py
- tests/ui/agents/test_agents_cli.py
- tests/ui/agents/test_agents_viaf_tasks.py
- tests/ui/test_cli.py
| @click.option("-p", "--progress", "progress", is_flag=True, default=False) | ||
| @click.option( | ||
| "-r", | ||
| "--rolling", | ||
| "rolling", | ||
| is_flag=True, | ||
| default=False, | ||
| help="Rolling refresh (oldest-first batch for ~6-month cycle).", | ||
| ) | ||
| @click.option( | ||
| "-d", | ||
| "--delete-if-not-found", | ||
| "delete_if_not_found", | ||
| is_flag=True, | ||
| default=False, | ||
| help="Delete old VIAF record if redirect target not found or fails.", | ||
| ) | ||
| @click.option( | ||
| "-p", "--pid", "viaf_pid", default=None, help="Harvest a single VIAF PID." | ||
| ) |
There was a problem hiding this comment.
Short flag -p is defined twice: --progress and --pid.
Both options use -p as their short flag (line 317 for --progress and line 335 for --pid). Click will only recognize the last definition, making --progress inaccessible via its short form.
🔧 Suggested fix
`@click.option`("-v", "--verbose", "verbose", is_flag=True, default=False)
-@click.option("-p", "--progress", "progress", is_flag=True, default=False)
+@click.option("-P", "--progress", "progress", is_flag=True, default=False)
`@click.option`(
"-r",
"--rolling",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rero_mef/agents/cli.py` around lines 317 - 336, The CLI defines the short
flag "-p" twice (for the progress option and the viaf_pid option), so change one
of the short flags to a unique letter to avoid collision; for example, update
the click.option for "viaf_pid" (symbol viaf_pid) to use a different short flag
such as "-i" or remove the short flag entirely, or alternatively change the
progress option's short flag, ensuring the click.option declarations for
"progress" and "viaf_pid" no longer share "-p".
Pull-based VIAF update using the JSON API since
the OAI-PMH endpoint is non-functional. Adds CLI
command harvest-viaf with rolling, missing and full refresh modes. Detects and handles VIAF cluster
merges (redirects) with automatic MEF cleanup.
Summary by CodeRabbit