Skip to content

feat(synthetic-sf): add script to generate synthetic structure factors#234

Open
DorisMai wants to merge 6 commits into
mainfrom
dm/add-sfc-synthetic
Open

feat(synthetic-sf): add script to generate synthetic structure factors#234
DorisMai wants to merge 6 commits into
mainfrom
dm/add-sfc-synthetic

Conversation

@DorisMai
Copy link
Copy Markdown
Contributor

@DorisMai DorisMai commented May 14, 2026

Summary

  • Add generate_synthetic_sf.py to produce MTZ files of structure factors using SFcalculator (SFC), with optional bulk solvent + default scaling allowed as well as R-free flag generation. Assumes no anomalous scattering for now.
  • Mirror generate_synthetic_density.py in terms of supporting single_structure vs batch_csv generation, altloc occupancy overwrite, and selection and stripping of hydrogens/waters/ligands.
  • Refactor out assign_occupancies and selection/stripping from generate_density_sf.py into eval/synthetic_utils for reuse in the SF scripts.
  • SFC is instantiated via AtomArray though atomarray_to_gemmi. As this function is likely reused in prototyping the new reward function, test is added. Test is specific to SFC environment.

Test plan

  • SFC is kept as a separate dependency group, which results in a few new environments -- analysis-dev-sfc, boltz-dev-sfc, and rf3-dev-sfc, that eventually will be merged (or added to workflows).
    ❗ There is dependency conflict with Protenix (Issue Dependency conflict betwen sfcalculator-torch and protenix #235 ), hence no corresponding env.
  • All 3 new environments passed tests/eval/test_generate_synthetic_sf.py
  • Validate occupancy change effects visually in coot for 6B8X and 2YL0
  • Validate batch_csv mode works
  • Validate Ftotal and Fprotein values on 5SOP_chainA_altlocB with manual selection/stripping/overwrite in pymol --> pymol saved to cif --> SFC from cif

Next steps

  • Prototype new reward function, focus on single chain for now.
  • Test multi-chain synthetic generation, otherwise Ftotal does not make sense.
  • Estimate memory usage/constraints in SFC.

Summary by CodeRabbit

  • New Features

    • Added a new CLI tool for generating synthetic protein structure factor amplitudes with batch processing support
    • Synthetic structure factors can now be output in MTZ format with optional bulk solvent and R-free flag assignment
  • Refactor

    • Consolidated structure loading and preprocessing logic into shared utility functions
  • Tests

    • Added comprehensive test coverage for structure conversion and scattering factor computation
  • Chores

    • Updated dependencies to include SFC_Torch support

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

📝 Walkthrough

Walkthrough

This PR introduces synthetic structure factor generation via a new CLI that leverages SFC_Torch. The changes refactor common structure-loading logic into reusable utilities, update an existing density-generation module to use those utilities, and add comprehensive argument parsing and batch-processing support for computing structure factors with optional solvent scaling and R-free flags.

Changes

Synthetic Structure Factor Generation with Refactored Utilities

Layer / File(s) Summary
Configuration and dependency setup
pyproject.toml
Adds sfc dependency group for sfcalculator-torch>=0.3, extends Pixi feature environments with -sfc variants (analysis-dev-sfc, boltz-dev-sfc, rf3-dev-sfc), and reorders type-checker rules.
Core structure loading and occupancy utilities
src/sampleworks/eval/synthetic_utils.py
New module with load_structure_for_synthetic_reward() (loads, filters, and assigns occupancies to structures) and assign_occupancies() (supports default, uniform, and custom occupancy modes with validation).
Refactor density generation to use shared utilities
src/sampleworks/eval/generate_synthetic_density.py
Delegates structure loading and preprocessing to load_structure_for_synthetic_reward(), removing the local assign_occupancies() implementation and inline preprocessing pipeline.
Structure factor generation CLI and computation
src/sampleworks/eval/generate_synthetic_sf.py
New CLI script that loads structures (single or batch from CSV), converts biotite AtomArray to gemmi.Structure, computes structure factors via SFC_Torch, and writes MTZ output with optional R-free flags, solvent scaling, and parallel batch processing. Includes BatchRow dataclass for CSV validation and atomarray_to_gemmi conversion utility.
Tests for structure factor conversion and computation
tests/eval/test_generate_synthetic_sf.py
Validates atomarray_to_gemmi conversion (unit cell, space group, atom count match), occupancy assignment behavior, and scattering-factor magnitude consistency between converted structures and direct Gemmi structures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • diff-use/sampleworks#75: Shares the refactored structure loading pipeline with ligand/water/hydrogen stripping and CIF-saving support in synthetic density generation.
  • diff-use/sampleworks#189: Overlaps in modifying pyproject.toml type-checker rules configuration.

Suggested reviewers

  • marcuscollins
  • k-chrispens

🐰 A structure loads with care,
Factors dance through SFC air,
Occupancies assigned with grace,
Batch and single find their place!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a script to generate synthetic structure factors, which is the primary purpose of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dm/add-sfc-synthetic

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DorisMai DorisMai marked this pull request as ready for review May 15, 2026 23:58
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
src/sampleworks/eval/synthetic_utils.py (3)

158-158: 💤 Low value

Add explanatory comment for type ignore.

Per coding guidelines, type ignores should include explanatory comments. The # ty: ignore[invalid-argument-type] suppresses a type error but doesn't explain why it's safe.

♻️ Proposed fix
-    altloc_info = detect_altlocs(atom_array)  # ty: ignore[invalid-argument-type]
+    # ty: ignore[invalid-argument-type] - atom_array is AtomArray after stripping ops,
+    # detect_altlocs signature may be overly narrow
+    altloc_info = detect_altlocs(atom_array)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/eval/synthetic_utils.py` at line 158, The call to
detect_altlocs(atom_array) uses a type ignore comment (# ty:
ignore[invalid-argument-type]) without justification; update the invocation in
synthetic_utils.py to keep the ignore but add a short explanatory comment after
it explaining why the type error is safe (e.g., atom_array is known at runtime
to match the expected type/has required attributes or originates from a
validated source) and reference the specific symbols detect_altlocs and
atom_array so future readers/linters understand the rationale.

81-91: ⚡ Quick win

Extra occ_values are silently ignored.

When len(occ_values) > len(altloc_info.altloc_ids), extra values are silently discarded without warning. This asymmetry with the "too few values" case (which warns) may surprise callers.

Consider adding a warning for the excess values case:

♻️ Proposed fix
         if len(occ_values) != len(altloc_info.altloc_ids):
+            if len(occ_values) > len(altloc_info.altloc_ids):
+                logger.warning(
+                    f"Expected {len(altloc_info.altloc_ids)} occupancy values, got {len(occ_values)}. "
+                    "Extra values will be ignored."
+                )
+                occ_values = occ_values[:len(altloc_info.altloc_ids)]
+            else:
-            logger.warning(
-                f"Expected {len(altloc_info.altloc_ids)} occupancy values, got {len(occ_values)}. "
-                "The missing values are automatically set to 0."
-            )
-            occ_values = occ_values + [0.0] * (len(altloc_info.altloc_ids) - len(occ_values))
+                logger.warning(
+                    f"Expected {len(altloc_info.altloc_ids)} occupancy values, got {len(occ_values)}. "
+                    "The missing values are automatically set to 0."
+                )
+                occ_values = occ_values + [0.0] * (len(altloc_info.altloc_ids) - len(occ_values))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/eval/synthetic_utils.py` around lines 81 - 91, The code
currently only warns when occ_values is shorter than altloc_info.altloc_ids but
silently ignores extra occ_values; update the logic around occ_values handling
(before the for loop that zips sorted(altloc_info.altloc_ids) with occ_values)
to detect when len(occ_values) > len(altloc_info.altloc_ids) and emit a
logger.warning indicating how many extra occupancy values were provided and that
they will be ignored, then trim occ_values to the expected length; keep the
existing branch that pads with zeros when occ_values is shorter and continue
assigning occupancy using occupancy[altloc_info.atom_masks[altloc]] in the
existing for altloc, occ loop so behavior remains consistent.

91-91: 💤 Low value

Return type cast may be incorrect for AtomArrayStack inputs.

The function signature accepts AtomArray | AtomArrayStack, but cast(AtomArray, result) always asserts AtomArray. If an AtomArrayStack is passed, this cast is misleading to type checkers and callers.

If the function genuinely only handles AtomArray, narrow the signature. Otherwise, remove the cast or make it conditional.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/eval/synthetic_utils.py` at line 91, The return cast to
AtomArray is incorrect when the function accepts AtomArray | AtomArrayStack:
update the function (in synthetic_utils.py) so it either narrows the signature
to AtomArray if it truly never returns a stack, or remove the unconditional cast
and return result with its original type; alternatively, if both are valid,
return a union type or perform a runtime check (e.g., isinstance(result,
AtomArrayStack)) and cast/annotate conditionally so callers and type checkers
see the correct AtomArray vs AtomArrayStack type instead of always using
cast(AtomArray, result).
tests/eval/test_generate_synthetic_sf.py (1)

39-44: 💤 Low value

Add return type annotation for consistency.

The stripped_atom_array fixture is missing a return type annotation, unlike the other fixtures.

Suggested fix
+from biotite.structure import AtomArray
+
 `@pytest.fixture`(scope="module")
-def stripped_atom_array(resources_dir: Path):
+def stripped_atom_array(resources_dir: Path) -> AtomArray:
     arr = load_structure_with_altlocs(resources_dir / "6b8x" / "6b8x_final.pdb")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/eval/test_generate_synthetic_sf.py` around lines 39 - 44,
stripped_atom_array fixture is missing a return type; update its signature to
include the same return type used by the other fixtures (e.g. def
stripped_atom_array(resources_dir: Path) -> AtomArray:) and add the appropriate
type import if not present. Keep the body unchanged (calls to
load_structure_with_altlocs, remove_hydrogens, remove_waters, keep_amino_acids,
keep_polymer) but ensure the declared return type matches the actual returned
value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/sampleworks/eval/generate_synthetic_sf.py`:
- Around line 442-460: The Parallel invocation may spawn multiple loky workers
that each initialize CUDA if device is a GPU, causing GPU contention; update the
code around the Parallel(...) call to detect GPU devices (check the passed-in
device variable for values like "cuda" or starting with "cuda:") and if a GPU is
detected and n_jobs > 1 either set n_jobs = 1 (safe default) or emit a clear
warning (e.g., via logging or warnings.warn) and fall back to n_jobs = 1; ensure
this logic is applied before calling Parallel and reference the Parallel(...)
call and the _process_single_row invocation so the change prevents multiple
worker processes from initializing separate CUDA contexts.

In `@tests/eval/test_generate_synthetic_sf.py`:
- Around line 100-122: These two GPU-dependent tests
(test_fprotein_matches_direct_gemmi and test_fprotein_changes_with_occupancy)
should be annotated with the pytest slow marker: add import pytest if missing
and place `@pytest.mark.slow` immediately above each test function definition that
uses the device fixture/SFcalculator so they are excluded from fast CI runs.
Ensure you only add the decorator (no other behavioral changes) above the def
lines for test_fprotein_matches_direct_gemmi and
test_fprotein_changes_with_occupancy.

---

Nitpick comments:
In `@src/sampleworks/eval/synthetic_utils.py`:
- Line 158: The call to detect_altlocs(atom_array) uses a type ignore comment (#
ty: ignore[invalid-argument-type]) without justification; update the invocation
in synthetic_utils.py to keep the ignore but add a short explanatory comment
after it explaining why the type error is safe (e.g., atom_array is known at
runtime to match the expected type/has required attributes or originates from a
validated source) and reference the specific symbols detect_altlocs and
atom_array so future readers/linters understand the rationale.
- Around line 81-91: The code currently only warns when occ_values is shorter
than altloc_info.altloc_ids but silently ignores extra occ_values; update the
logic around occ_values handling (before the for loop that zips
sorted(altloc_info.altloc_ids) with occ_values) to detect when len(occ_values) >
len(altloc_info.altloc_ids) and emit a logger.warning indicating how many extra
occupancy values were provided and that they will be ignored, then trim
occ_values to the expected length; keep the existing branch that pads with zeros
when occ_values is shorter and continue assigning occupancy using
occupancy[altloc_info.atom_masks[altloc]] in the existing for altloc, occ loop
so behavior remains consistent.
- Line 91: The return cast to AtomArray is incorrect when the function accepts
AtomArray | AtomArrayStack: update the function (in synthetic_utils.py) so it
either narrows the signature to AtomArray if it truly never returns a stack, or
remove the unconditional cast and return result with its original type;
alternatively, if both are valid, return a union type or perform a runtime check
(e.g., isinstance(result, AtomArrayStack)) and cast/annotate conditionally so
callers and type checkers see the correct AtomArray vs AtomArrayStack type
instead of always using cast(AtomArray, result).

In `@tests/eval/test_generate_synthetic_sf.py`:
- Around line 39-44: stripped_atom_array fixture is missing a return type;
update its signature to include the same return type used by the other fixtures
(e.g. def stripped_atom_array(resources_dir: Path) -> AtomArray:) and add the
appropriate type import if not present. Keep the body unchanged (calls to
load_structure_with_altlocs, remove_hydrogens, remove_waters, keep_amino_acids,
keep_polymer) but ensure the declared return type matches the actual returned
value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 724f9f85-5506-461e-aa16-c524bebf8acf

📥 Commits

Reviewing files that changed from the base of the PR and between fbf6d38 and f07cca8.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • pyproject.toml
  • src/sampleworks/eval/generate_synthetic_density.py
  • src/sampleworks/eval/generate_synthetic_sf.py
  • src/sampleworks/eval/synthetic_utils.py
  • tests/eval/test_generate_synthetic_sf.py

Comment on lines +442 to +460
Parallel(n_jobs=n_jobs, backend="loky")(
delayed(_process_single_row)(
row=row,
base_dir=base_dir,
output_dir=output_dir,
dmin=dmin,
mode=mode,
occ_mode=occ_mode,
test_fraction=test_fraction,
seed=seed,
device=device,
strip_hydrogens=strip_hydrogens,
strip_waters=strip_waters,
strip_ligands=strip_ligands,
simulate_solvent_and_scale=simulate_solvent_and_scale,
save_structure=save_structure,
)
for row in rows
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

GPU contention risk when running parallel jobs on CUDA device.

When device is a CUDA GPU and n_jobs > 1, multiple loky worker processes will create separate CUDA contexts and compete for GPU memory. This can cause OOM errors or severe performance degradation. Consider:

  1. Defaulting to n_jobs=1 when device is CUDA, or
  2. Adding a warning when n_jobs != 1 and device is GPU.
Proposed fix
 def process_batch(
     csv_path: Path,
     base_dir: Path,
     output_dir: Path,
     dmin: float,
     mode: str,
     occ_mode: str,
     test_fraction: float,
     seed: int | None,
     device: torch.device,
     n_jobs: int = -1,
     strip_hydrogens: bool = False,
     strip_waters: bool = False,
     strip_ligands: bool = False,
     simulate_solvent_and_scale: bool = False,
     save_structure: bool = False,
 ) -> None:
     ...
     from joblib import delayed, Parallel

     rows = load_batch_csv(csv_path)
+    if device.type == "cuda" and n_jobs != 1:
+        logger.warning(
+            f"Running {n_jobs} parallel jobs on CUDA device may cause GPU memory contention. "
+            "Consider using n_jobs=1 for GPU-based computation."
+        )
     logger.info(f"Processing {len(rows)} structures from {csv_path} using {n_jobs} jobs")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/eval/generate_synthetic_sf.py` around lines 442 - 460, The
Parallel invocation may spawn multiple loky workers that each initialize CUDA if
device is a GPU, causing GPU contention; update the code around the
Parallel(...) call to detect GPU devices (check the passed-in device variable
for values like "cuda" or starting with "cuda:") and if a GPU is detected and
n_jobs > 1 either set n_jobs = 1 (safe default) or emit a clear warning (e.g.,
via logging or warnings.warn) and fall back to n_jobs = 1; ensure this logic is
applied before calling Parallel and reference the Parallel(...) call and the
_process_single_row invocation so the change prevents multiple worker processes
from initializing separate CUDA contexts.

Comment on lines +100 to +122
def test_fprotein_matches_direct_gemmi(
self, gemmi_structure_from_atomarray, stripped_gemmi, device
):
f_atomarray = _compute_fprotein(gemmi_structure_from_atomarray, device)
f_direct = _compute_fprotein(stripped_gemmi, device)
np.testing.assert_allclose(np.abs(f_atomarray), np.abs(f_direct), atol=1e-3)

def test_fprotein_changes_with_occupancy(self, stripped_atom_array, stripped_gemmi, device):
altloc_info = detect_altlocs(stripped_atom_array)

arr_default = assign_occupancies(stripped_atom_array, altloc_info, "default")
f_default = _compute_fprotein(
atomarray_to_gemmi(arr_default, stripped_gemmi.cell, stripped_gemmi.spacegroup_hm),
device,
)

arr_custom = assign_occupancies(stripped_atom_array, altloc_info, "custom", [0.2, 0.8, 0.0])
f_custom = _compute_fprotein(
atomarray_to_gemmi(arr_custom, stripped_gemmi.cell, stripped_gemmi.spacegroup_hm),
device,
)

assert not np.allclose(np.abs(f_default), np.abs(f_custom), atol=1e-3)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Mark GPU-dependent tests with @pytest.mark.slow.

The tests test_fprotein_matches_direct_gemmi and test_fprotein_changes_with_occupancy use the device fixture and SFcalculator, which may require GPU resources. As per coding guidelines, tests requiring GPU should be marked with @pytest.mark.slow to exclude them from fast CI runs.

Suggested fix
+    `@pytest.mark.slow`
     def test_fprotein_matches_direct_gemmi(
         self, gemmi_structure_from_atomarray, stripped_gemmi, device
     ):
         f_atomarray = _compute_fprotein(gemmi_structure_from_atomarray, device)
         f_direct = _compute_fprotein(stripped_gemmi, device)
         np.testing.assert_allclose(np.abs(f_atomarray), np.abs(f_direct), atol=1e-3)

+    `@pytest.mark.slow`
     def test_fprotein_changes_with_occupancy(self, stripped_atom_array, stripped_gemmi, device):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/eval/test_generate_synthetic_sf.py` around lines 100 - 122, These two
GPU-dependent tests (test_fprotein_matches_direct_gemmi and
test_fprotein_changes_with_occupancy) should be annotated with the pytest slow
marker: add import pytest if missing and place `@pytest.mark.slow` immediately
above each test function definition that uses the device fixture/SFcalculator so
they are excluded from fast CI runs. Ensure you only add the decorator (no other
behavioral changes) above the def lines for test_fprotein_matches_direct_gemmi
and test_fprotein_changes_with_occupancy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is not slow and should be able to run with cpu too

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DorisMai, thanks for the clarification! Since SFcalculator supports CPU execution and the device fixture falls back to CPU when no GPU is available, the @pytest.mark.slow annotation is not appropriate here. I'll withdraw that suggestion.

(ミ` ◡ ´ミ)


✏️ Learnings added
Learnt from: DorisMai
Repo: diff-use/sampleworks PR: 234
File: tests/eval/test_generate_synthetic_sf.py:100-122
Timestamp: 2026-05-16T00:14:29.452Z
Learning: In the `diff-use/sampleworks` repo, SFcalculator (SFC_Torch) tests such as those in `tests/eval/test_generate_synthetic_sf.py` (e.g., `test_fprotein_matches_direct_gemmi`, `test_fprotein_changes_with_occupancy`) can run on CPU and are not considered slow. Do NOT suggest adding `pytest.mark.slow` to these tests solely because they use `SFcalculator` or the `device` fixture — the `device` fixture falls back to CPU and execution is fast enough for regular CI runs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: k-chrispens
Repo: diff-use/sampleworks PR: 73
File: src/sampleworks/core/samplers/protocol.py:1-18
Timestamp: 2026-02-02T19:13:51.594Z
Learning: In Python files across the repo, do not suggest or add inline or file-level suppressions for Ruff F722 (jaxtyping dimension strings). Instead rely on a global ignore configured in pyproject.toml (as done by the sampleworks project). Verify that F722 is included in the global ignore, and avoid recommending per-file # ruff: noqa: F722 directives unless explicitly documented as an exception.

Learnt from: marcuscollins
Repo: diff-use/sampleworks PR: 132
File: src/sampleworks/utils/guidance_script_utils.py:585-586
Timestamp: 2026-03-05T16:30:40.514Z
Learning: In Python code, if enums subclassing StrEnum are used (e.g., GuidanceType, StructurePredictor), they serialize to plain strings with json.dump and pickle without special handling. Do not flag these as non-serializable in code reviews. Treat StrEnum values as strings for JSON serialization and ensure tests cover that behavior; no extra pickle handling needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant