Integration of early Dropped-Jobs, RNG Seeding and Workload generator & alternative "prices" script. #3

enlorenz · 2025-12-08T16:24:48Z

Integrate my early features, to prevent divergence.

coderabbitai · 2025-12-08T16:30:40Z

📝 Walkthrough

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main changes: integration of dropped-jobs penalty, RNG seeding for determinism, workload generator, and deterministic prices module.
Description check	✅ Passed	The description is brief but directly related to the changeset, indicating the intent to integrate early features and prevent divergence.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

environment.py (1)
81-101: Centralized job-slot clearing helpers are good, but next_empty debug reference is stale

The new _clear_job_slot and _validate_next_empty helpers centralize queue maintenance and are a good idea. One small issue:

In the debug block inside assign_jobs_to_available_nodes, you reference self.next_empty_slot_ref[0], but there is no self.next_empty_slot_ref attribute; only the local next_empty_slot_ref parameter exists.

Since the whole block is currently guarded by if False, this won’t execute, but it’s confusing and will break if you ever flip the flag. Consider updating it to:
-            if False:
-                if getattr(self, "strict_checks", False) and (self.current_step % 100 == 0):
-                    self._validate_next_empty(job_queue_2d, self.next_empty_slot_ref[0])
+            if False:
+                if getattr(self, "strict_checks", False) and (self.current_step % 100 == 0) and next_empty_slot_ref is not None:
+                    self._validate_next_empty(job_queue_2d, next_empty_slot_ref[0])
to keep the debug helper self-consistent.

Also applies to: 575-596

🧹 Nitpick comments (3)

deter_wlgen.py (1)

5-71: Sanity harness is solid; consider minor message tweak for uniform mode

The determinism, bounds, and distribution smoke checks are a nice complement to the simpler tests. In check_distribution_smoke, the assertion message "flat mean looks off" under if cfg.arrivals == "uniform": is slightly confusing now that “uniform” is a distinct mode; consider updating the message to mention “uniform” instead.

environment.py (1)

304-331: Drop accounting and penalties integrate well with episode metrics

The new tracking for dropped jobs and queue-full rejections (agent vs baseline) plus the drop-based penalty in calculate_reward and the extra episode metrics in record_episode_completion are coherently wired:

assign_jobs_to_available_nodes increments jobs_dropped / baseline_jobs_dropped and per-episode counters when jobs exceed MAX_JOB_AGE.

reset_state and record_episode_completion correctly reset and report jobs_dropped, drop_rate, and queue-full rejections for both agent and baseline.

The excess-dropped penalty compares agent vs baseline (self.new_excess) and feeds a (non-positive) drop_penalty into the reward, avoiding penalizing equally-bad baselines.

The only nit is that if drop_penalty > 0: drop_penalty = 0 is redundant, since PENALTY_DROPPED_JOB is negative and self.new_excess is non-negative, so drop_penalty will never be positive. You can safely drop that branch to simplify the code.

Also applies to: 480-506, 762-806, 909-954

workloadgen.py (1)

27-54: Workload generator core logic is sound; document reserved hour_idx parameter

WorkloadGenConfig and WorkloadGenerator._sample_job_count/sample implement the flat/poisson/uniform modes with appropriate caps and bounds, and JobSpec cleanly encapsulates job attributes. The only minor point is that hour_idx is currently unused in sample; adding a short comment (as you did) or a # noqa / similar hint will make it clear this is reserved for future daily patterns and silence static analysis noise.

Also applies to: 56-110

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da25ecb and e966d2a.

📒 Files selected for processing (7)

WIP_prices.py (1 hunks)
deter_wlgen.py (1 hunks)
environment.py (20 hunks)
sampler_hourly.py (2 hunks)
sanity_workloadgen.py (1 hunks)
train.py (4 hunks)
workloadgen.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

sampler_hourly.py

📄 CodeRabbit inference engine (CLAUDE.md)

Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Files:

sampler_hourly.py

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

workloadgen.py
sampler_hourly.py
environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_duration.py : Sample job durations from simple duration logs in the Duration Sampler

Applied to files:

sampler_hourly.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

train.py
environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (5)

deter_wlgen.py (1)

workloadgen.py (3)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

sample (99-110)

train.py (2)

workloadgen.py (2)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

environment.py (2)

ComputeClusterEnv (67-439)

render (72-73)

WIP_prices.py (1)

environment.py (1)

reset (254-284)

sanity_workloadgen.py (1)

workloadgen.py (3)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

sample (99-110)

environment.py (7)

WIP_prices.py (2)

Prices (4-141)

advance_and_get_predicted_prices (86-90)

prices.py (1)

Prices (4-121)

weights.py (1)

Weights (5-15)

plot.py (1)

plot (6-77)

sampler_jobs.py (3)

DurationSampler (7-444)

parse_jobs (18-115)

sample (123-168)

sampler_hourly.py (2)

parse_jobs (13-101)

sample (103-143)

workloadgen.py (1)

sample (99-110)

🪛 Ruff (0.14.7)

workloadgen.py

60-60: Avoid specifying long messages outside the exception class

(TRY003)

89-89: Avoid specifying long messages outside the exception class

(TRY003)

99-99: Unused method argument: hour_idx

(ARG002)

environment.py

440-440: Expected a statement

(invalid-syntax)

440-440: Expected a statement

(invalid-syntax)

440-440: Expected a statement

(invalid-syntax)

440-440: Expected a statement

(invalid-syntax)

441-441: Unexpected indentation

(invalid-syntax)

442-442: Expected a statement

(invalid-syntax)

442-442: Expected a statement

(invalid-syntax)

442-442: Expected a statement

(invalid-syntax)

442-442: Expected a statement

(invalid-syntax)

442-443: Expected a statement

(invalid-syntax)

444-444: Unexpected indentation

(invalid-syntax)

453-453: Expected a statement

(invalid-syntax)

453-453: Expected a statement

(invalid-syntax)

453-453: Expected a statement

(invalid-syntax)

453-453: Expected a statement

(invalid-syntax)

453-453: Invalid decimal integer literal

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Unparenthesized generator expression cannot be used here

(invalid-syntax)

453-453: Expected in, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Unparenthesized generator expression cannot be used here

(invalid-syntax)

453-453: Invalid assignment target

(invalid-syntax)

453-453: Expected in, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found string

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected ,, found name

(invalid-syntax)

453-453: Expected an identifier

(invalid-syntax)

631-631: Expected a statement

(invalid-syntax)

631-631: Expected a statement

(invalid-syntax)

631-631: Expected a statement

(invalid-syntax)

631-631: Expected a statement

(invalid-syntax)

632-632: Unexpected indentation

(invalid-syntax)

634-634: Expected a statement

(invalid-syntax)

634-634: Expected a statement

(invalid-syntax)

634-634: Expected a statement

(invalid-syntax)

634-634: Expected a statement

(invalid-syntax)

634-635: Expected a statement

(invalid-syntax)

635-635: Unexpected indentation

(invalid-syntax)

638-638: Expected a statement

(invalid-syntax)

638-638: Expected a statement

(invalid-syntax)

638-638: Expected a statement

(invalid-syntax)

638-638: Expected a statement

(invalid-syntax)

638-638: Invalid decimal integer literal

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Unparenthesized generator expression cannot be used here

(invalid-syntax)

638-638: Expected in, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Unparenthesized generator expression cannot be used here

(invalid-syntax)

638-638: Invalid assignment target

(invalid-syntax)

638-638: Expected in, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found string

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected ,, found name

(invalid-syntax)

638-638: Expected an identifier

(invalid-syntax)

640-640: Unexpected indentation

(invalid-syntax)

665-665: Expected a statement

(invalid-syntax)

665-665: Expected a statement

(invalid-syntax)

665-665: Expected a statement

(invalid-syntax)

665-665: Expected a statement

(invalid-syntax)

674-674: Expected a statement

(invalid-syntax)

674-674: Expected a statement

(invalid-syntax)

674-674: Expected a statement

(invalid-syntax)

674-674: Expected a statement

(invalid-syntax)

674-675: Expected a statement

(invalid-syntax)

676-676: Expected a statement

(invalid-syntax)

676-676: Expected a statement

(invalid-syntax)

676-676: Expected a statement

(invalid-syntax)

676-676: Expected a statement

(invalid-syntax)

676-676: Invalid decimal integer literal

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Unparenthesized generator expression cannot be used here

(invalid-syntax)

676-676: Expected in, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Unparenthesized generator expression cannot be used here

(invalid-syntax)

676-676: Invalid assignment target

(invalid-syntax)

676-676: Expected in, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found string

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected ,, found name

(invalid-syntax)

676-676: Expected an identifier

(invalid-syntax)

677-677: Unexpected indentation

(invalid-syntax)

691-691: unindent does not match any outer indentation level

(invalid-syntax)

693-693: Unexpected indentation

(invalid-syntax)

712-712: Expected a statement

(invalid-syntax)

712-712: Expected a statement

(invalid-syntax)

712-712: Expected a statement

(invalid-syntax)

712-712: Expected a statement

(invalid-syntax)

713-713: Unexpected indentation

(invalid-syntax)

714-714: Expected a statement

(invalid-syntax)

714-714: Expected a statement

(invalid-syntax)

714-714: Expected a statement

(invalid-syntax)

714-714: Expected a statement

(invalid-syntax)

714-715: Expected a statement

(invalid-syntax)

715-715: Unexpected indentation

(invalid-syntax)

718-718: Expected a statement

(invalid-syntax)

718-718: Expected a statement

(invalid-syntax)

718-718: Expected a statement

(invalid-syntax)

718-718: Expected a statement

(invalid-syntax)

718-718: Invalid decimal integer literal

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Unparenthesized generator expression cannot be used here

(invalid-syntax)

718-718: Expected in, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Unparenthesized generator expression cannot be used here

(invalid-syntax)

718-718: Invalid assignment target

(invalid-syntax)

718-718: Expected in, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found string

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected ,, found name

(invalid-syntax)

718-718: Expected an identifier

(invalid-syntax)

729-729: Expected a statement

(invalid-syntax)

729-729: Expected a statement

(invalid-syntax)

729-729: Expected a statement

(invalid-syntax)

729-729: Expected a statement

(invalid-syntax)

730-730: Unexpected indentation

(invalid-syntax)

731-731: Expected a statement

(invalid-syntax)

731-731: Expected a statement

(invalid-syntax)

731-731: Expected a statement

(invalid-syntax)

731-731: Expected a statement

(invalid-syntax)

731-732: Expected a statement

(invalid-syntax)

734-734: Unexpected indentation

(invalid-syntax)

743-743: Expected a statement

(invalid-syntax)

743-743: Expected a statement

(invalid-syntax)

743-743: Expected a statement

(invalid-syntax)

743-743: Expected a statement

(invalid-syntax)

743-743: Invalid decimal integer literal

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Unparenthesized generator expression cannot be used here

(invalid-syntax)

743-743: Expected in, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Unparenthesized generator expression cannot be used here

(invalid-syntax)

743-743: Invalid assignment target

(invalid-syntax)

743-743: Expected in, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found string

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected ,, found name

(invalid-syntax)

743-743: Expected an identifier

(invalid-syntax)

🔇 Additional comments (4)

sampler_hourly.py (1)

103-143: Hourly sampler RNG integration and max_jobs cap look correct

Using an external RNG plus max_jobs to cap arrivals preserves the independent sampling of job_count/duration/nodes/cores-per-node per hour and fits the intended usage with deterministic generators. Just ensure callers always pass a numpy Generator-like rng.

sanity_workloadgen.py (1)

5-43: Determinism, bounds, and Poisson sanity tests are well-structured

These sanity checks exercise the main WorkloadGenerator behaviors (determinism, per-field bounds, and Poisson mean) in a simple, reproducible way and align with the generator’s API.

train.py (1)

10-18: Workload generator CLI wiring and env construction look consistent with constraints

The new --workload-gen options, WorkloadGenConfig construction, and passing workload_gen into ComputeClusterEnv are all consistent with the environment’s limits (duration ≤ 170h, 1–16 nodes, 1–96 cores, ≤1500 jobs/hour). The norm_path helper also correctly normalizes empty-string paths to None while preserving the in-memory prices list.

Also applies to: 49-52, 91-107, 110-141
environment.py (1)
159-163: np_random is never seeded or initialized before use

self.np_random is set to None in __init__ but is used as a numpy RNG in step() through hourly_sampler.sample(), self.workload_gen.sample(), and self.np_random.integers() calls. Since reset() does not currently seed or initialize self.np_random, the first call to any of these methods will raise an AttributeError.

To fix this, initialize the RNG in reset() using Gymnasium-compatible seeding:
 def reset(self, seed = None, options = None):
+        if seed is not None or self.np_random is None:
+            self.np_random, self._seed = seeding.np_random(seed)
+
         self.reset_state()
         self.prices.reset()
This ensures deterministic behavior for a given seed and avoids reliance on global RNGs, aligning with the Gymnasium-compatible environment design requirement.

Also applies to: 254-283, 362-417

environment.py

prices_deterministic.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

environment.py (2)

440-441: Critical: assign_jobs_to_available_nodes returns wrong values.

The call expects (num_launched_jobs, next_empty_slot), but the function at lines 618-691 has critical bugs:

Initializes num_processed_jobs instead of num_launched_jobs
Tries to increment undefined variables num_launched and num_dropped (lines 661, 674)
Returns num_processed_jobs which is never incremented

This will cause a NameError at runtime. See the detailed fix in the review comment for lines 618-691.

618-691: Critical: Undefined variables cause NameError.

The function has multiple critical bugs:

Line 619: Initializes num_processed_jobs = 0 but this variable is never incremented and doesn't match the return expectations.
Lines 661, 674: Attempt to increment undefined variables num_launched and num_dropped, causing a NameError at runtime.
Line 691: Returns (num_processed_jobs, next_empty_slot) but callers expect the first value to be the number of launched jobs.

The function should track launched jobs and return the count along with the updated next_empty_slot. Dropped jobs are already tracked via self.jobs_dropped and don't need to be returned.

Apply this fix:

     def assign_jobs_to_available_nodes(self, job_queue_2d, nodes, cores_available, running_jobs, next_empty_slot, is_baseline=False):
-        num_processed_jobs = 0
+        num_launched = 0

         for job_idx, job in enumerate(job_queue_2d):
             job_duration, job_age, job_nodes, job_cores_per_node = job

             if job_duration <= 0:
                 continue

             # Candidates: node is on and has enough free cores
             mask = (nodes >= 0) & (cores_available >= job_cores_per_node)
             candidate_nodes = np.where(mask)[0]

             if len(candidate_nodes) >= job_nodes:
                 # Assign job to first job_nodes candidates
                 job_allocation = []
                 for i in range(job_nodes):
                     node_idx = int(candidate_nodes[i])
                     cores_available[node_idx] -= int(job_cores_per_node)
                     nodes[node_idx] = max(int(nodes[node_idx]), int(job_duration))
                     job_allocation.append((node_idx, int(job_cores_per_node)))

                 running_jobs[self.next_job_id] = {
                     "duration": int(job_duration),
                     "allocation": job_allocation,
                 }
                 self.next_job_id += 1

                 # Clear job from queue
                 job_queue_2d[job_idx] = [0, 0, 0, 0]

                 # Update next_empty_slot if we cleared a slot before it
                 if job_idx < next_empty_slot:
                     next_empty_slot = job_idx

                 # Track job completion and wait time
                 if is_baseline:
                     self.baseline_jobs_completed += 1
                     self.baseline_total_job_wait_time += int(job_age)
                 else:
                     self.jobs_completed += 1
                     self.total_job_wait_time += int(job_age)

                 num_launched += 1
                 continue

             # Not enough resources -> job waits and ages (or gets dropped)
             new_age = int(job_age) + 1

             if new_age > MAX_JOB_AGE:
                 # Clear job from queue
                 job_queue_2d[job_idx] = [0, 0, 0, 0]

                 # Update next_empty_slot if we cleared a slot before it
                 if job_idx < next_empty_slot:
                     next_empty_slot = job_idx
-                num_dropped += 1

                 if is_baseline:
                     self.baseline_jobs_dropped += 1
                     self.baseline_dropped_this_episode += 1
                 else:
                     self.jobs_dropped += 1
                     self.dropped_this_episode += 1
             else:
                 job_queue_2d[job_idx][1] = new_age
                 
             
             # DEBUG CHECK for next_empty -> Add a Flag to be called
             if False:
                 if getattr(self, "strict_checks", False) and (self.current_step % 100 == 0):
                     self._validate_next_empty(job_queue_2d, self.next_empty_slot_ref[0])

-        return num_processed_jobs, next_empty_slot
+        return num_launched, next_empty_slot

Note: The num_dropped variable is not needed as a return value since dropped jobs are tracked via instance variables (self.jobs_dropped, etc.).

🧹 Nitpick comments (1)

environment.py (1)

627-638: Consider removing redundant int() conversions.

The extensive use of int() conversions (lines 635-638, 641-642, 656, 659) is defensive but likely unnecessary since job_queue_2d is defined with dtype=np.int32. These conversions add visual noise without providing value.

If you want to keep them for extra safety, consider adding a comment explaining why they're needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e966d2a and 44e711e.

📒 Files selected for processing (1)

environment.py (19 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (1)

environment.py (5)

WIP_prices.py (1)

advance_and_get_predicted_prices (86-90)

weights.py (1)

Weights (5-15)

sampler_jobs.py (4)

parse_jobs (18-115)

precalculate_hourly_jobs (364-383)

sample_one_hourly (416-444)

sample (123-168)

sampler_hourly.py (2)

parse_jobs (13-101)

sample (103-143)

workloadgen.py (1)

sample (99-110)

🪛 Ruff (0.14.7)

environment.py

661-661: Undefined name num_launched

(F821)

661-661: Local variable num_launched is assigned to but never used

(F841)

674-674: Undefined name num_dropped

(F821)

674-674: Local variable num_dropped is assigned to but never used

(F841)

🔇 Additional comments (8)

environment.py (8)

87-100: LGTM! Good refactoring for job slot management.

The helper methods _clear_job_slot and _validate_next_empty centralize job queue management logic and add defensive checks. This improves maintainability and makes the invariant (next_empty_slot points to the first empty slot) explicit.

125-126: LGTM! Workload generator integration supports deterministic job generation.

The workload_gen parameter enables external workload generation via the WorkloadGenerator class, supporting the PR objective for deterministic and configurable job generation.

254-265: LGTM! Price state integration is correct.

The reset() method correctly initializes price state via self.prices.reset() and uses predicted_prices from the prices object. This aligns with the deterministic price evolution from WIP_prices.

344-344: LGTM! Centralized price advancement.

Using advance_and_get_predicted_prices() from the Prices class centralizes the price evolution logic, improving maintainability.

513-514: LGTM! Correct Gymnasium episode termination semantics.

Setting truncated=True and terminated=False when the time limit is reached correctly follows Gymnasium's API convention. This distinguishes between time-based episode endings (truncated) and task-based endings (terminated).

17-18: The review comment cannot be verified due to repository access issues. Manual verification is required to confirm the RNG initialization concerns and the appropriateness of the suggested fix.

372-417: Verify hourly_sampler.sample() API signature and assess legacy code cleanup.

Line 384 passes max_jobs=max_to_generate to hourly_sampler.sample(). Confirm this parameter is supported by the sampler's implementation.

Lines 404-416 contain commented-out legacy code mixed with the new implementation using self.np_random. If this represents completed migration to non-global RNG, remove the commented lines to improve code clarity.

8-9: Verify the WIP_prices import is intentional for this integration.

The import has been switched to WIP_prices with a "Test re-worked prices script" comment. Ensure this temporary change is intentional for the integration PR and won't cause issues in production or if the WIP module is incomplete.

environment.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

environment.py (2)
87-101: Leverage _clear_job_slot helper to centralize queue slot management (optional)

You’ve added _clear_job_slot and _validate_next_empty (Lines 87–101), but still clear slots and adjust next_empty_slot manually in assign_jobs_to_available_nodes and the drop branch (Lines 646–652, 668–673). For maintainability and to avoid subtle inconsistencies, you could route these through _clear_job_slot instead.

Example for the two clear‑and-update sites:
-                # Clear job from queue
-                job_queue_2d[job_idx] = [0, 0, 0, 0]
-
-                # Update next_empty_slot if we cleared a slot before it
-                if job_idx < next_empty_slot:
-                    next_empty_slot = job_idx
+                # Clear job from queue and keep next_empty_slot consistent
+                if next_empty_slot is not None:
+                    ref = [next_empty_slot]
+                    self._clear_job_slot(job_queue_2d, job_idx, ref)
+                    next_empty_slot = ref[0]
+                else:
+                    self._clear_job_slot(job_queue_2d, job_idx)
Same pattern can be used in the MAX_JOB_AGE drop path. This reduces duplication and ensures your debug validator stays in sync with all slot mutations.

Also applies to: 646-652, 668-673

467-470: Drop penalty integration looks good; consider updating debug print to include it

The new excess-drop tracking (new_excess / prev_excess_dropped, Lines 467–470) and drop_penalty = min(0, PENALTY_DROPPED_JOB * self.new_excess) (Lines 754–755) correctly ensure only additional excess drops are penalized and that the penalty is never positive.

The debug summary at Line 766, however, omits drop_penalty from the printed breakdown even though it’s part of reward. For more transparent debugging:
-        self.env_print(f"    > $$$TOTAL: {reward:.4f} = {efficiency_reward_weighted:.4f} + {price_reward_weighted:.4f} + {idle_penalty_weighted:.4f} + {job_age_penalty_weighted:.4f}")
+        self.env_print(
+            f"    > $$$TOTAL: {reward:.4f} = "
+            f"{efficiency_reward_weighted:.4f} + {price_reward_weighted:.4f} "
+            f"+ {idle_penalty_weighted:.4f} + {job_age_penalty_weighted:.4f} "
+            f"+ {drop_penalty:.4f}"
+        )
Purely a logging clarity tweak; behavior is already correct.

Also applies to: 754-756, 766-766

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44e711e and 8cfa7b9.

📒 Files selected for processing (1)

environment.py (19 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (1)

environment.py (4)

WIP_prices.py (1)

advance_and_get_predicted_prices (86-90)

sampler_jobs.py (2)

DurationSampler (7-444)

sample (123-168)

workloadgen.py (1)

sample (99-110)

sampler_hourly.py (1)

sample (103-143)

🪛 Ruff (0.14.7)

environment.py

661-661: Undefined name num_launched

(F821)

661-661: Local variable num_launched is assigned to but never used

(F841)

674-674: Undefined name num_dropped

(F821)

674-674: Local variable num_dropped is assigned to but never used

(F841)

🔇 Additional comments (4)

environment.py (4)
22-43: Environment constraints, penalties, and episode metrics align well with design goals

Cluster and episode constants (Lines 22–43, 45) match the documented configuration: 335 nodes, 96 cores/node, up to 16 nodes/job, 2‑week (336‑hour) episodes, queue and job bounds, and appropriate power costs.

reset() now pulls predicted prices from Prices (Lines 254–265), which keeps initial observations consistent with the price model.

Reward composition (Lines 721–769) and record_episode_completion metrics (Lines 867–912) now factor in drop behavior, queue‑full rejections, and baseline comparisons in a clean, inspectable way.

No changes requested here; this structure looks solid for RL training and analysis.

Also applies to: 45-45, 254-265, 721-769, 867-912

159-163: Initialize self.np_random in reset() before first use

self.np_random is set to None in __init__ and never initialized, but step() uses it in the hourly sampler path, workload generator path, and legacy randomizer (Lines 372–416). This will raise an AttributeError on the first step() call.

You're already importing seeding and accepting a seed argument in reset(). A minimal fix is to initialize (or re-seed) the RNG there:
-    def reset(self, seed = None, options = None):
-        self.reset_state()
-        self.prices.reset()
+    def reset(self, seed=None, options=None):
+        # Initialize or re-seed deterministic RNG
+        if seed is not None or self.np_random is None:
+            self.np_random, self._seed = seeding.np_random(seed)
+
+        self.reset_state()
+        self.prices.reset()
@@
-            'predicted_prices': self.prices.predicted_prices.copy(),
+            'predicted_prices': self.prices.predicted_prices.copy(),
         }
This keeps Gymnasium-compatible seeding while ensuring all RNG users have a valid generator.

Also applies to: 254-265, 372-416

618-691: Fix undefined variable references in assign_jobs_to_available_nodes

The method initializes num_processed_jobs = 0 but increments undefined variables num_launched (line 661) and num_dropped (line 674), causing NameError at runtime. The variable num_processed_jobs is never incremented, so the return value is always 0.

Increment num_processed_jobs instead of the undefined variables:

Line 661: Change num_launched += 1 to num_processed_jobs += 1

Line 674: Remove num_dropped += 1 (already tracking via self.jobs_dropped and self.baseline_jobs_dropped)

168-177: Fix self.jobs_sampler usage when external_jobs is provided

In __init__ you configure the global jobs_sampler (Lines 168–176), but in step() you reference self.jobs_sampler.sample_one_hourly(...) (Line 364). If self.jobs_sampler is never assigned, environments with external_jobs will hit an AttributeError.

Either revert to using the module-level jobs_sampler in step(), or attach it to the instance in __init__:
         if self.external_jobs:
             print(f"Loading jobs from {self.external_jobs}")
             jobs_sampler.parse_jobs(self.external_jobs, 60)
@@
             print(f"Parsed hourly jobs for {len(jobs_sampler.hourly_jobs)} hours")
+            self.jobs_sampler = jobs_sampler
Then step() can safely use self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"].

Also applies to: 362-365

…step and baseline_step. Assign_jobs now delivers "launched" and "dropped" numbers. Added penalty in reward function (not normalized or weighted yet). Centralized "remove job from list". Changed from "terminated" to "truncated", to account for "time-limit" end. Fix: Untracked mergers now up to date Minor Fixes: CodeRabbit's suggestions

…RNG is used in any function that uses any type of RNG. Conserves determinism for training. Adjusted "sampler_hourly" to allow for this logic, and the way it's read in -> Now it's a "self." function.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

environment.py (1)
618-691: Critical: num_launched and num_dropped are undefined.

The function initializes num_processed_jobs = 0 on line 619, but then increments num_launched (line 661) and num_dropped (line 674), which are never defined. This will raise NameError at runtime.

Either rename num_processed_jobs or add the missing initializations:
     def assign_jobs_to_available_nodes(self, job_queue_2d, nodes, cores_available, running_jobs, next_empty_slot, is_baseline=False):
-        num_processed_jobs = 0
+        num_launched = 0
+        num_dropped = 0
 
         for job_idx, job in enumerate(job_queue_2d):
             ...
-                num_launched += 1
+                num_launched += 1  # now defined
                 continue
             ...
-                num_dropped += 1
+                num_dropped += 1  # now defined
             ...
 
-        return num_processed_jobs, next_empty_slot
+        return num_launched, next_empty_slot
Note: num_dropped is tracked but never returned. If drop count is needed by callers, update the return signature.

♻️ Duplicate comments (3)

WIP_prices.py (1)
20-30: Guard against empty external_prices arrays.

If an empty sequence is passed, np.min and np.percentile will raise errors. Add an explicit check.
         if self.original_prices is not None:
             prices = np.asarray(self.original_prices, dtype=np.float32)
+            if prices.size == 0:
+                raise ValueError("external_prices must be a non-empty sequence")
 
             min_price = float(np.min(prices))
environment.py (2)
363-364: Inconsistent sampler reference.

Line 364 references self.jobs_sampler, but this instance attribute is only conditionally created in reset_state() (lines 332-335). If reset_state() is not called or the condition is false, self.jobs_sampler won't exist, yet __init__ uses the singleton jobs_sampler (line 170-176).

This creates confusion between the singleton jobs_sampler and instance self.jobs_sampler. Standardize on one approach.

332-335: Redundant sampler re-initialization still present.

This re-initialization was flagged in a previous review as redundant (the sampler is already initialized in __init__ lines 168-176) and marked as addressed, but the code is still present. This causes unnecessary file I/O on every episode reset.
-        if self.external_jobs and not self.workload_gen:
-            self.jobs_sampler = DurationSampler()
-            self.jobs_sampler.parse_jobs(self.external_jobs, 60)
-            self.jobs_sampler.precalculate_hourly_jobs(CORES_PER_NODE, MAX_NODES_PER_JOB)

🧹 Nitpick comments (9)

sampler_hourly.py (2)
4-4: Unused random import.

The random module is imported but no longer used after switching to rng.choice. Consider removing it to keep imports clean.
-import random
103-103: Type hint int | None requires Python 3.10+.

The union syntax int | None is only available in Python 3.10+. For broader compatibility, use Optional[int] from the typing module.
+from typing import Optional
+
-    def sample(self, hour_of_day: int, rng, max_jobs: int | None = None):
+    def sample(self, hour_of_day: int, rng, max_jobs: Optional[int] = None):
WIP_prices.py (2)
75-77: Consider using collections.deque for price history.

Using list.pop(0) is O(n). For a fixed-size rolling window, collections.deque(maxlen=HISTORY_WINDOW) provides O(1) operations.
+from collections import deque
+
 class Prices:
     ...
     def reset(self):
-        self.price_history = []
+        self.price_history = deque(maxlen=self.HISTORY_WINDOW)
         ...
 
     def get_next_price(self):
         ...
         self.price_history.append(new_price)
-        if len(self.price_history) > self.HISTORY_WINDOW:
-            self.price_history.pop(0)
+        # deque automatically removes oldest when maxlen exceeded
137-141: Figure not closed on early return paths.

If save_path is provided, plt.savefig is called, but if an exception occurs between plt.figure() and plt.close(), the figure leaks. Consider using a try/finally or context manager pattern.
train.py (2)
119-126: Remove commented-out code.

The old parameter lines are commented out rather than removed. Dead code should be deleted to maintain clarity.
-                           # external_prices=prices,
-                           # external_durations=job_durations_file_path,
-                           # external_jobs=jobs_file_path,
-                           # external_hourly_jobs=hourly_jobs_file_path,
                            external_prices=norm_path(prices),
112-113: norm_path could be simplified.

The helper works but is slightly verbose. Consider:
-    def norm_path(x):
-        return None if (x is None or str(x).strip() == "") else x
+    def norm_path(x):
+        return x if x else None
This works because empty strings are falsy in Python. However, if you need to preserve whitespace-only strings as None, keep the current implementation.
sanity_workloadgen.py (1)
22-29: test_constraints uses default flat_jobs_per_hour which may differ from max_new_jobs_per_hour.

The config sets max_new_jobs_per_hour=200 but doesn't set flat_jobs_per_hour. Per workloadgen.py line 47, flat_jobs_per_hour defaults to 200, which happens to match here. For clarity, explicitly set it:
-    cfg = WorkloadGenConfig(arrivals="flat", max_new_jobs_per_hour=200)
+    cfg = WorkloadGenConfig(arrivals="flat", flat_jobs_per_hour=200, max_new_jobs_per_hour=200)
workloadgen.py (2)
57-61: Consider using dataclasses.replace() for cleaner config reconstruction.

The current approach reconstructs the config manually using a dict comprehension. Since WorkloadGenConfig is a dataclass, you can use dataclasses.replace() for a more idiomatic solution.
+from dataclasses import dataclass, replace
-from dataclasses import dataclass
 from typing import List, Optional
 import numpy as np
 class WorkloadGenerator:
     def __init__(self, cfg: WorkloadGenConfig):
         arrivals = cfg.arrivals.lower().strip()
         if arrivals not in ("flat", "poisson", "uniform"):
             raise ValueError(f"arrivals must be 'flat', 'uniform' or 'poisson', got: {cfg.arrivals}")
-        self.cfg = WorkloadGenConfig(arrivals=arrivals, **{k: v for k, v in cfg.__dict__.items() if k != "arrivals"})
+        self.cfg = replace(cfg, arrivals=arrivals)
73-74: Optional: Simplify attribute access.

The getattr calls with defaults are defensive but unnecessary since these are dataclass fields with defaults already defined in WorkloadGenConfig.
-        target = int(getattr(self.cfg, "flat_jobs_per_hour", self.cfg.max_new_jobs_per_hour))
-        jitter = int(getattr(self.cfg, "flat_jitter", 0))
+        target = int(self.cfg.flat_jobs_per_hour)
+        jitter = int(self.cfg.flat_jitter)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cfa7b9 and 0ce559a.

📒 Files selected for processing (7)

WIP_prices.py (1 hunks)
deter_wlgen.py (1 hunks)
environment.py (19 hunks)
sampler_hourly.py (2 hunks)
sanity_workloadgen.py (1 hunks)
train.py (4 hunks)
workloadgen.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

deter_wlgen.py

🧰 Additional context used

📓 Path-based instructions (4)

sampler_hourly.py

📄 CodeRabbit inference engine (CLAUDE.md)

Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Files:

sampler_hourly.py

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

sampler_hourly.py
environment.py
workloadgen.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

sampler_hourly.py
environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

train.py
environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.769Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.769Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (2)

sanity_workloadgen.py (1)

workloadgen.py (3)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

sample (99-110)

WIP_prices.py (1)

environment.py (1)

reset (254-284)

🪛 Ruff (0.14.8)

environment.py

661-661: Undefined name num_launched

(F821)

661-661: Local variable num_launched is assigned to but never used

(F841)

674-674: Undefined name num_dropped

(F821)

674-674: Local variable num_dropped is assigned to but never used

(F841)

workloadgen.py

60-60: Avoid specifying long messages outside the exception class

(TRY003)

89-89: Avoid specifying long messages outside the exception class

(TRY003)

99-99: Unused method argument: hour_idx

(ARG002)

🔇 Additional comments (9)

sampler_hourly.py (1)

117-143: LGTM! Sampling logic correctly uses external RNG.

The implementation correctly:

Wraps hour_of_day with modulo 24

Uses external rng for all random choices

Handles max_jobs cap with proper bounds checking

Samples job characteristics (duration, nodes, cores_per_node) independently

This aligns with the coding guidelines for the Hourly Sampler.

WIP_prices.py (1)

86-90: Forecast window semantics may be surprising.

advance_and_get_predicted_prices rolls the array left and places the current price (just fetched) at the end. This means predicted_prices[0] becomes the price from 1 hour ago, not the current price. Verify this aligns with how environment.py uses predicted_prices[0] as current_price on line 345.

environment.py (2)

88-101: LGTM! Clean helper methods for job slot management.

_clear_job_slot and _validate_next_empty provide good abstractions for slot management and debugging validation. The validation logic correctly checks that next_empty points to an empty slot and there are no holes before it.

753-755: LGTM! Drop penalty logic simplified.

The drop penalty calculation is now cleaner with min(0, PENALTY_DROPPED_JOB * self.new_excess), ensuring the penalty is always non-positive. This addresses the previous review feedback.

train.py (1)

91-106: LGTM! Workload generator integration.

The workload generator is properly configured using environment constants (MAX_JOB_DURATION, MIN_NODES_PER_JOB, etc.) ensuring consistency. The conditional instantiation based on args.workload_gen is clean.

sanity_workloadgen.py (1)

1-43: LGTM! Solid sanity tests for workload generator.

The tests effectively verify:

Determinism: Same seed produces identical job sequences

Constraints: All generated jobs respect configured bounds

Statistical properties: Poisson mean is within expected range

Good coverage for catching regressions in the workload generation logic.

workloadgen.py (3)

1-25: LGTM! Clear requirements and appropriate imports.

The module documentation clearly articulates the design goals (deterministic, controllable, composable, future-proof) and the imports are minimal and appropriate.

27-53: LGTM! Well-designed immutable configuration.

The frozen dataclasses ensure immutability, which is appropriate for configuration and job specification objects. Default values are sensible and the optional hard_cap_jobs provides a useful safety mechanism.

99-110: LGTM! Efficient vectorized sampling with clear future-proofing.

The vectorized sampling approach is efficient and the hour_idx parameter is intentionally unused (as documented) to enable future daily patterns without breaking the API. The +1 adjustments on upper bounds correctly account for rng.integers being exclusive on the high end.

environment.py

coderabbitai · 2025-12-15T15:33:59Z

workloadgen.py

+        else:
+            raise ValueError(f"Unknown arrivals mode: {mode}")


⚠️ Potential issue | 🟡 Minor

Unreachable error handling.

This ValueError can never be raised because __init__ already validates that mode is one of the three allowed values. While defensive programming is valuable, this creates dead code that can confuse maintainers.

Consider removing the unreachable else clause:

elif mode == "uniform": # This is the old "flat". k = int(rng.integers(0, self.cfg.max_new_jobs_per_hour + 1)) - - else: - raise ValueError(f"Unknown arrivals mode: {mode}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

else:

raise ValueError(f"Unknown arrivals mode: {mode}")

elif mode == "uniform":

# This is the old "flat".

k = int(rng.integers(0, self.cfg.max_new_jobs_per_hour + 1))

🧰 Tools

🪛 Ruff (0.14.8)

89-89: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In workloadgen.py around lines 88-89, the final else branch that raises ValueError is unreachable because mode is validated in __init__; remove that else clause (delete the unreachable raise) so only the valid-mode branches remain, or if you prefer an explicit sanity check keep a single assert False / AssertionError with a clear message instead of the current redundant ValueError.

…ces, it would iterate too much. Agent and env became out of synch. "get_predicted_prices" now "Advance_and_get..." Reset in env, now also resets prices state. CodeRabbit Fixes: Corrected seeding. Removed double initialization of jobs sampler. Corrected num_processed jobs. Renamed to prices_deterministic. Cleaned random in sampler. Added dataclass replace in workloadgen. Supposed to provide O(1) instead of O(n). Extra Fixes: Renamed unclear script names. Removed unused functions in environment. And minor adjustments.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

environment.py (1)
353-362: Potential reference error: self.jobs_sampler may not exist.

Line 355 references self.jobs_sampler.sample_one_hourly(), but the code in __init__ (lines 161-169) uses the module-level jobs_sampler (imported from sampler_jobs), not an instance variable self.jobs_sampler.

Either:

Change line 355 to use the module-level jobs_sampler.sample_one_hourly(), or

Initialize self.jobs_sampler as an instance variable in __init__

Apply this diff if option 1 is intended:
-            jobs = self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
+            jobs = jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
Or apply this diff in __init__ if option 2 is intended:
         if self.external_jobs:
             print(f"Loading jobs from {self.external_jobs}")
-            jobs_sampler.parse_jobs(self.external_jobs, 60)
+            self.jobs_sampler = jobs_sampler
+            self.jobs_sampler.parse_jobs(self.external_jobs, 60)

♻️ Duplicate comments (1)

workloadgen.py (1)

88-89: Unreachable error handling.

This has already been flagged in a previous review.

🧹 Nitpick comments (8)

workloadgen.py (1)
72-79: Unnecessary getattr calls for always-present fields.

flat_jobs_per_hour and flat_jitter are defined with defaults in WorkloadGenConfig, so getattr is unnecessary overhead.
         if mode == "flat":
-            target = int(getattr(self.cfg, "flat_jobs_per_hour", self.cfg.max_new_jobs_per_hour))
-            jitter = int(getattr(self.cfg, "flat_jitter", 0))
+            target = int(self.cfg.flat_jobs_per_hour)
+            jitter = int(self.cfg.flat_jitter)
 
             if jitter <= 0:
                 k = target
             else:
                 k = int(rng.integers(target - jitter, target + jitter + 1))
prices_deterministic.py (1)
78-80: Remove commented-out code.

Dead code with a "remove during next PR" note should ideally be cleaned up now rather than carried forward.
         self.price_history.append(new_price)
-        #if len(self.price_history) > self.HISTORY_WINDOW:
-        #    self.price_history.pop(0)
-        # deque automatically removes oldest when maxlen exceeded | Keep the old line for now. Remove during next PR.
 
         return new_price
test_sanity_workloadgen.py (2)
31-37: Consider tightening the tolerance band.

With poisson_lambda=50.0 over 2000 samples, the standard error is approximately sqrt(50/2000) ≈ 0.16, so the mean should be within ~0.5 of 50 with high probability. The current band of [45, 55] is very loose and may miss subtle bugs.

A tighter band like [48, 52] would still pass reliably while catching more issues:
-    assert 45.0 < mean < 55.0, mean  # loose band, just to catch obvious bugs
+    assert 48.0 < mean < 52.0, mean  # tighter band, still reliable for 2000 samples
1-43: Consider consolidating with test_determinism_workloadgen.py.

This file and test_determinism_workloadgen.py have significant overlap in functionality (determinism checks, bounds validation). Consider consolidating into a single test module to reduce maintenance burden.
train.py (1)
124-127: Potential issue: prices is a list, not a path.

norm_path is designed for path strings, but prices (line 124) is already a list of price values (or None). Passing a list through norm_path will always return the list unchanged, but semantically this is confusing.
-                            external_prices=norm_path(prices),
+                            external_prices=prices,
                             external_durations=norm_path(job_durations_file_path),
                             external_jobs=norm_path(jobs_file_path),
                             external_hourly_jobs=norm_path(hourly_jobs_file_path),
test_determinism_workloadgen.py (1)
61-69: Consider adding a "uniform" config to test all modes.

Currently only "flat" and "poisson" configs are tested, but check_distribution_smoke has (or should have) validation for "uniform" mode as well.
 def main():
     # Example configs to test
     flat_cfg = WorkloadGenConfig(arrivals="flat", max_new_jobs_per_hour=1000)
     pois_cfg = WorkloadGenConfig(arrivals="poisson", poisson_lambda=200.0, max_new_jobs_per_hour=1000)
+    unif_cfg = WorkloadGenConfig(arrivals="uniform", max_new_jobs_per_hour=1000)
 
-    for cfg in (flat_cfg, pois_cfg):
+    for cfg in (flat_cfg, pois_cfg, unif_cfg):
         check_determinism(cfg)
         check_bounds(cfg)
         check_distribution_smoke(cfg)
environment.py (2)
619-635: Consider removing excessive int() casts.

The code contains numerous int() casts (lines 627-630, 633-635) when assigning jobs. If job_duration, job_cores_per_node, and other job attributes are already integers from the job queue, these casts are redundant and add noise.

Verify the data types:

If job_queue_2d is dtype=np.int32, the values are already integers and casts are unnecessary.

If casts are needed due to upstream type issues, consider fixing the root cause instead.
#!/bin/bash
# Check the dtype of job_queue initialization
rg -n "job_queue.*dtype" environment.py
752-752: Consider including drop_penalty in debug output.

The debug print at line 752 shows the main reward components but omits drop_penalty. For consistency and debugging purposes, consider adding it:
-        self.env_print(f"    > $$$TOTAL: {reward:.4f} = {efficiency_reward_weighted:.4f} + {price_reward_weighted:.4f} + {idle_penalty_weighted:.4f} + {job_age_penalty_weighted:.4f}")
+        self.env_print(f"    > $$$TOTAL: {reward:.4f} = {efficiency_reward_weighted:.4f} + {price_reward_weighted:.4f} + {idle_penalty_weighted:.4f} + {job_age_penalty_weighted:.4f} + {drop_penalty:.4f}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ce559a and ed14541.

📒 Files selected for processing (9)

deter_wlgen.py (1 hunks)
environment.py (19 hunks)
prices_deterministic.py (1 hunks)
sampler_hourly.py (2 hunks)
sanity_workloadgen.py (1 hunks)
test_determinism_workloadgen.py (1 hunks)
test_sanity_workloadgen.py (1 hunks)
train.py (4 hunks)
workloadgen.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

sampler_hourly.py
deter_wlgen.py
sanity_workloadgen.py

🧰 Additional context used

📓 Path-based instructions (3)

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

train.py
environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

workloadgen.py
environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Use Gymnasium-compatible environment design for RL simulations in the Environment module

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (5)

train.py (2)

workloadgen.py (2)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

environment.py (2)

ComputeClusterEnv (67-899)

render (72-73)

test_determinism_workloadgen.py (2)

workloadgen.py (3)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

sample (99-110)

sampler_hourly.py (1)

sample (102-142)

workloadgen.py (1)

sampler_hourly.py (1)

sample (102-142)

prices_deterministic.py (1)

environment.py (1)

reset (247-280)

test_sanity_workloadgen.py (2)

workloadgen.py (3)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

sample (99-110)

sampler_hourly.py (1)

sample (102-142)

🪛 Ruff (0.14.8)

train.py

117-117: Redefinition of unused norm_path from line 93

(F811)

workloadgen.py

60-60: Avoid specifying long messages outside the exception class

(TRY003)

89-89: Avoid specifying long messages outside the exception class

(TRY003)

99-99: Unused method argument: hour_idx

(ARG002)

prices_deterministic.py

23-23: Avoid specifying long messages outside the exception class

(TRY003)

54-54: Undefined name deque

(F821)

🔇 Additional comments (22)

workloadgen.py (3)

27-31: LGTM!

Immutable JobSpec dataclass is well-designed for deterministic workload representation.

34-53: LGTM!

Configuration is well-structured with sensible defaults. The frozen dataclass ensures immutability.

99-110: LGTM!

The sample method is well-implemented. The unused hour_idx parameter is documented as a placeholder for future daily pattern support, which is a reasonable design choice for API stability.

prices_deterministic.py (3)

20-23: LGTM!

The empty array guard addresses the concern from the previous review, ensuring np.min/np.percentile are never called on empty arrays.

89-93: advance_and_get_predicted_prices returns stale "next" price.

After get_next_price() advances the index and returns the current step's price, you roll the prediction window and insert that same price at the end. However, the last slot should contain the newly predicted price for price_index + PREDICTION_WINDOW - 1, not the price that was just consumed. This may cause the prediction window to lag by one step.

Please verify whether the predicted prices window is intended to represent future prices or a rolling history including the current price.

119-144: LGTM!

The plotting utility is well-implemented with appropriate percentile markers and flexible save/display options.

test_sanity_workloadgen.py (2)

5-8: LGTM!

Good helper function for validating job constraints.

10-20: LGTM!

Determinism test correctly validates that identical seeds produce identical outputs.

train.py (3)

41-44: LGTM!

Weight defaults align with coding guidelines: efficiency-weight (0.7), price-weight (0.2), idle-weight (0.1), job-age-weight (0.0). Based on coding guidelines.

49-52: LGTM!

CLI arguments for workload generator are well-designed with sensible defaults.

96-111: LGTM!

Workload generator configuration correctly uses environment constants for resource bounds, ensuring consistency between the generator and environment constraints.

test_determinism_workloadgen.py (2)

5-6: LGTM!

Clean helper for converting jobs to comparable tuples.

8-19: LGTM!

Determinism check is well-implemented with clear error messaging.
environment.py (9)
119-119: LGTM! Workload generator and RNG scaffolding integrated correctly.

The addition of the workload_gen parameter and initialization of deterministic RNG state (self.np_random, self._seed) align with the PR objectives for deterministic workload generation. The RNG will be properly seeded in the reset() method.

Also applies to: 152-156

248-252: LGTM! RNG seeding properly implemented.

The addition of super().reset(seed=seed) and self.np_random, self._seed = seeding.np_random(seed) correctly implements deterministic RNG seeding. This resolves the previous critical issue where self.np_random was None and would cause AttributeError at runtime.

Based on learnings, the environment should use Gymnasium-compatible design for RL simulations.

363-376: Good defensive programming for queue capacity.

The addition of queue capacity checks (queue_free, max_to_generate) before calling hourly_sampler.sample() prevents generating more jobs than can be enqueued. This reduces wasted work and improves efficiency.

656-676: LGTM! Drop tracking logic is correct.

The drop logic correctly:

Increments job age when resources are unavailable

Removes jobs exceeding MAX_JOB_AGE

Updates drop counters for both agent and baseline

Updates next_empty_slot when clearing jobs

458-460: LGTM! Drop penalty logic is clear and correct.

The drop penalty logic has been simplified and is now correct:

excess tracks how many more jobs the agent dropped vs. baseline in the episode

new_excess tracks the incremental increase in excess drops this step

drop_penalty = min(0, PENALTY_DROPPED_JOB * self.new_excess) ensures the penalty is always non-positive

The penalty is correctly added to the total reward

This addresses the previous review comment about confusing conditional logic.

Also applies to: 740-741, 749-749

863-864: LGTM! Drop metrics properly integrated into episode recording.

The episode recording correctly:

Calculates drop rates with zero-division protection

Includes comprehensive drop metrics (jobs_dropped, drop_rate, jobs_rejected_queue_full) for both agent and baseline

Maintains consistency with metrics tracked throughout the episode

Also applies to: 891-897

14-14: Verify the necessity of the DurationSampler import.

The import of DurationSampler from sampler_jobs is not visibly used in the current code. If this import is not required, it should be removed to avoid clutter.

Run the following script to verify if DurationSampler is used anywhere in the file:
#!/bin/bash
# Check if DurationSampler is used in environment.py
rg -n '\bDurationSampler\b' environment.py | grep -v '^14:'
385-408: Workload generator integration and RNG usage verified.

The WorkloadGenerator API contract is correctly implemented and used:

WorkloadGenerator class (workloadgen.py) has a sample(hour_idx: int, rng: np.random.Generator) method that returns List[JobSpec]

JobSpec dataclass (workloadgen.py) contains exactly the three attributes accessed in the code: duration, nodes, and cores_per_node

The code correctly calls workload_gen.sample(self.current_hour - 1, self.np_random) and accesses job attributes safely

RNG usage with self.np_random.integers() is consistent throughout

252-252: Prices API is correctly implemented.

The prices_deterministic.Prices class provides both the reset() method (line 51) and the predicted_prices attribute (initialized at line 18, populated in reset() at lines 57-63). The calls at environment.py lines 252-260 (self.prices.reset() and self.prices.predicted_prices.copy()) are valid and use the correct API.

prices_deterministic.py

test_determinism_workloadgen.py

train.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

environment.py (1)
359-366: Inconsistent sampler usage: self.jobs_sampler vs module-level jobs_sampler.

Line 360 uses self.jobs_sampler.sample_one_hourly(), but __init__ (lines 161-169) initializes the module-level jobs_sampler (imported from sampler_jobs), not self.jobs_sampler.

If the redundant initialization in reset_state (lines 329-332) is removed, this line will fail with AttributeError: 'ComputeClusterEnv' object has no attribute 'jobs_sampler'.

Either consistently use the module-level jobs_sampler or properly initialize self.jobs_sampler once in __init__.
         if self.external_jobs:
-            jobs = self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
+            jobs = jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]

♻️ Duplicate comments (1)

environment.py (1)
328-332: Redundant jobs_sampler re-initialization in reset_state.

This re-initializes self.jobs_sampler and re-parses the jobs file on every episode reset. The sampler is already initialized in __init__ (lines 161-169) and should be reused. This causes unnecessary file I/O and processing overhead each episode.

A previous review marked this as addressed, but the code still appears to contain the redundant initialization.
-        #Define jobs_sampler as a self. object, to ensure determinism
-        if self.external_jobs and not self.workload_gen:
-            self.jobs_sampler = DurationSampler()
-            self.jobs_sampler.parse_jobs(self.external_jobs, 60)
-            self.jobs_sampler.precalculate_hourly_jobs(CORES_PER_NODE, MAX_NODES_PER_JOB)

🧹 Nitpick comments (4)

test_sanity_workloadgen.py (2)
31-37: Consider widening the assertion bounds for statistical stability.

The assertion checks 48.0 < mean < 52.0 but the comment mentions "45.0–55.0". For a Poisson(50) distribution over 2000 samples, the standard error is approximately sqrt(50/2000) ≈ 0.16, so the 95% CI is roughly [49.7, 50.3]. The current bounds (±4%) are reasonable but could still fail occasionally due to random variance.

Consider using a slightly wider band or documenting the expected flakiness rate.
-    assert 48.0 < mean < 52.0, mean  # tighter band, still reliable for 2000 samples
+    assert 45.0 < mean < 55.0, f"Poisson mean {mean} outside expected range"  # ~10% band, very unlikely to flake
39-43: Consider using pytest or unittest for consistent test discovery.

Running tests via if __name__ == "__main__" works but won't be discovered by standard test runners. If other tests in the project use pytest or unittest, consider aligning this module.
deter_wlgen.py (1)

61-72: Consider consolidating with test_sanity_workloadgen.py.

This module (deter_wlgen.py) and test_sanity_workloadgen.py have overlapping functionality (determinism, bounds, distribution checks). Consider consolidating into a single test module to avoid maintenance burden.

prices_deterministic.py (1)

118-143: Consider closing figures explicitly after save.

The plot_price_histogram method calls plt.close() at the end, which is good. However, if an exception occurs between plt.figure() and plt.close(), the figure won't be closed. Consider using a try/finally or context manager pattern.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed14541 and 13a3b28.

📒 Files selected for processing (7)

deter_wlgen.py (1 hunks)
environment.py (19 hunks)
prices_deterministic.py (1 hunks)
sanity_workloadgen.py (1 hunks)
test_determinism_workloadgen.py (1 hunks)
test_sanity_workloadgen.py (1 hunks)
train.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

sanity_workloadgen.py
test_determinism_workloadgen.py

🧰 Additional context used

📓 Path-based instructions (3)

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (6)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

train.py
environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (4)

train.py (2)

workloadgen.py (2)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

environment.py (2)

ComputeClusterEnv (67-904)

render (72-73)

prices_deterministic.py (1)

environment.py (1)

reset (247-280)

deter_wlgen.py (1)

workloadgen.py (2)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

test_sanity_workloadgen.py (2)

workloadgen.py (3)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

sample (99-110)

sampler_hourly.py (1)

sample (102-142)

🪛 Ruff (0.14.8)

prices_deterministic.py

24-24: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (25)

train.py (4)

10-18: LGTM - Clean import organization for new functionality.

The imports properly bring in the WorkloadGenerator and WorkloadGenConfig from the new workloadgen module, along with environment constants needed for config initialization.

49-52: LGTM - CLI arguments align with WorkloadGenConfig.

The new arguments (--workload-gen, --wg-poisson-lambda, --wg-max-jobs-hour) properly expose the key workload generator configuration options with sensible defaults matching the WorkloadGenConfig defaults.

90-111: LGTM - Path normalization and workload generator setup.

The norm_path helper correctly normalizes empty strings to None, which addresses the inconsistent truthy/falsy evaluation mentioned in the comment. The WorkloadGenConfig is properly constructed using environment constants for resource bounds.

118-134: LGTM - Environment initialization with normalized paths and workload generator.

The norm_path function is correctly applied to the external data paths, and workload_gen is properly passed to the environment. This completes the integration wiring.

test_sanity_workloadgen.py (3)

1-9: LGTM - Well-structured test helper.

The assert_job_valid helper correctly validates job attributes against config bounds, promoting code reuse across tests.

10-20: LGTM - Determinism test correctly verifies reproducibility.

The test properly seeds two independent RNGs with the same seed and verifies that identical seeds produce identical job sequences across multiple hours.

22-29: LGTM - Constraint validation test is thorough.

The test correctly validates that all generated jobs respect the configured bounds over 200 hours of sampling.

deter_wlgen.py (3)

1-7: LGTM - Helper function for tuple conversion.

The jobs_to_tuples helper correctly extracts the comparable attributes for determinism checks.

8-19: LGTM - Determinism check is robust.

The function correctly creates two independent generators and RNGs with identical seeds, then verifies output equality across hours.

21-36: LGTM - Bounds checking is thorough.

The function validates both the job count caps and individual job attribute bounds.

prices_deterministic.py (5)

1-4: LGTM - Imports correctly include deque.

The deque import has been added as flagged in previous reviews.

12-41: LGTM - Robust initialization with empty array guard.

The empty array check at line 23-24 addresses the previous review concern. The price shift logic correctly ensures all prices are >= 1.

44-47: LGTM - Deterministic synthetic price generation.

The sinusoidal price model provides a deterministic daily pattern with a guaranteed lower bound of 1.0.

52-67: LGTM - Reset correctly initializes deterministic state.

The reset method properly initializes price_index to PREDICTION_WINDOW and populates predicted_prices from either external data or synthetic generation.

69-92: LGTM - Stateful stepping methods are well-designed.

get_next_price correctly advances the index and maintains history, while advance_and_get_predicted_prices properly rolls the prediction window forward.

environment.py (10)

8-18: LGTM - Clean imports for deterministic components.

The imports correctly bring in the new Prices class from prices_deterministic, DurationSampler, and Gymnasium's seeding utility for deterministic RNG.

45-47: LGTM - Explicit drop penalty constant.

The PENALTY_DROPPED_JOB = -5.0 constant is clearly defined and appropriately negative for a penalty.

86-94: LGTM - Useful debugging validator.

The _validate_next_empty helper provides useful assertions for debugging queue integrity. The guard at line 89 correctly handles the case where next_empty equals the queue length.

118-155: LGTM - Constructor correctly extended for workload_gen and RNG.

The workload_gen=None parameter is properly added to the signature and stored. The RNG placeholders (np_random, _seed) are correctly initialized to None and will be seeded in reset().

247-252: LGTM - RNG correctly seeded in reset.

The seeding.np_random(seed) call properly initializes the deterministic RNG, addressing the previous review concern about AttributeError when accessing self.np_random.

368-413: LGTM - Deterministic job generation with workload_gen and non-global RNG.

The code correctly prioritizes workload_gen when available, falls back to the legacy randomizer with self.np_random instead of the global np.random, and properly handles the hourly sampler path with queue space checks.

614-682: LGTM - Drop tracking integrated into job assignment.

The assign_jobs_to_available_nodes method now correctly tracks dropped jobs when they exceed MAX_JOB_AGE, updates the appropriate counters for both agent and baseline, and properly manages next_empty_slot when jobs are dropped.

744-754: LGTM - Drop penalty correctly integrated into reward.

The drop penalty is computed as min(0, PENALTY_DROPPED_JOB * self.new_excess), ensuring it's always non-positive. This addresses the previous review concern about dead code in the conditional check.

509-510: Clarify terminated vs truncated semantics.

The code sets truncated = True and terminated = False when the episode ends due to reaching EPISODE_HOURS. The comment says "Changed distinction, so we can identify if ended due to time-limit" which is correct Gymnasium semantics (truncated = time limit, terminated = natural end). This is good practice.

867-903: LGTM - Episode completion recording extended with drop metrics.

The record_episode_completion method now correctly includes drop rate calculations and all drop-related metrics for both agent and baseline.

deter_wlgen.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

environment.py (1)

385-409: Consider removing commented legacy code.

The workload generator integration looks correct with proper deterministic RNG usage. However, the commented-out legacy code (lines 396, 401, 405-406) could be removed to reduce clutter since the new implementation is established.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13a3b28 and c977c61.

📒 Files selected for processing (5)

environment.py (19 hunks)
prices_deterministic.py (1 hunks)
test_determinism_workloadgen.py (1 hunks)
test_sanity_workloadgen.py (1 hunks)
train.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

test_sanity_workloadgen.py

🧰 Additional context used

📓 Path-based instructions (3)

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (6)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

train.py
environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (4)

test_determinism_workloadgen.py (2)

workloadgen.py (2)

WorkloadGenConfig (35-53)

sample (99-110)

sampler_hourly.py (1)

sample (102-142)

train.py (2)

workloadgen.py (2)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

environment.py (1)

render (72-73)

prices_deterministic.py (1)

environment.py (1)

reset (248-281)

environment.py (4)

prices_deterministic.py (3)

Prices (5-143)

reset (52-67)

advance_and_get_predicted_prices (88-92)

prices.py (1)

Prices (4-121)

sampler_jobs.py (5)

DurationSampler (7-444)

parse_jobs (18-115)

precalculate_hourly_jobs (364-383)

sample_one_hourly (416-444)

sample (123-168)

workloadgen.py (1)

sample (99-110)

🪛 Ruff (0.14.8)

prices_deterministic.py

24-24: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (31)

test_determinism_workloadgen.py (5)

1-6: LGTM!

Clean imports and concise helper function for converting job objects to comparable tuples.

8-19: LGTM!

The determinism test correctly verifies that two generators with identical seeds produce identical outputs.

21-36: LGTM!

Comprehensive bounds checking against all config limits including the optional hard_cap_jobs.

38-62: LGTM!

Distribution smoke tests correctly validate mean expectations for all three arrival modes with appropriate tolerances.

64-76: LGTM!

Main function provides good coverage by testing all three arrival modes.

prices_deterministic.py (6)

1-11: LGTM!

Clean imports and well-defined class constants for the price model.

12-41: LGTM!

Proper initialization with empty array guard and price normalization to ensure positive values.

43-50: LGTM!

Pure deterministic price functions with proper floor at 1.0.

52-67: LGTM!

Reset correctly initializes all state for deterministic episode starts with proper array copying.

69-92: LGTM!

Stateful price stepping methods correctly advance state and return copies to prevent external mutation.

94-143: LGTM!

Non-mutating utility methods for statistics and visualization with proper resource cleanup via plt.close().

train.py (4)

10-18: LGTM!

Clean imports for workload generator and environment constants needed for config alignment.

49-52: LGTM!

CLI arguments for workload generator with sensible defaults matching WorkloadGenConfig.

90-111: LGTM!

Proper path normalization and workload generator initialization with config aligned to environment constants.

113-134: LGTM!

Environment initialization correctly applies path normalization and passes workload generator.

environment.py (16)

8-18: LGTM!

Clean switch to deterministic Prices and proper import of gymnasium seeding utilities.

45-47: LGTM!

Explicit penalty constant for dropped jobs with clear negative value.

86-94: LGTM!

Useful debug validator for the next_empty_slot invariant.

117-170: LGTM!

Proper initialization of workload generator with conditional sampler setup and deferred RNG seeding following gymnasium patterns.

184-191: LGTM!

Symmetric job drop tracking for both agent and baseline paths.

248-253: LGTM!

Correct gymnasium seeding pattern ensuring deterministic episode resets.

312-328: LGTM!

Comprehensive reset of all drop-related tracking counters.

337-338: LGTM!

Correct usage of the renamed deterministic prices API.

363-383: LGTM!

Smart queue-space gating prevents generating jobs that can't be enqueued, with proper deterministic RNG usage.

413-417: LGTM!

Proper tracking of jobs rejected due to queue capacity limits.

459-461: LGTM!

Correct incremental calculation of excess drops compared to baseline for fair penalty application.

505-506: LGTM!

Correct gymnasium semantics: episode timeout is a truncation, not a termination.

610-678: LGTM!

Robust job assignment with proper aging and drop tracking, including explicit type conversions for dictionary values.

685-689: LGTM!

Symmetric rejected job tracking for baseline path.

740-750: LGTM!

Clean drop penalty calculation using min(0, ...) pattern for clarity.

863-899: LGTM!

Comprehensive drop metrics added to episode recording with proper zero-division guards.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

prices_deterministic.py (1)

23-24: Consider defining a custom exception class for reusability.

The inline error message triggers a style warning (TRY003). For better maintainability, consider defining a custom exception class if this validation pattern is reused elsewhere, or acknowledge this as an acceptable style deviation for a one-off validation.
environment.py (3)
365-376: Consider applying queue capacity check to workload generator path.

The hourly sampler path pre-checks available queue space to avoid generating jobs that won't fit. The workload generator path (lines 385-393) doesn't have this optimization and may generate jobs that get rejected. While add_new_jobs handles overflow correctly, applying the same pre-check would improve consistency and efficiency.

Apply this pattern to the workload generator path:
         else:
+            # Check available queue space
+            queue_used = int(np.count_nonzero(job_queue_2d[:, 0] > 0))
+            queue_free = max(0, MAX_QUEUE_SIZE - queue_used)
+            max_to_generate = min(queue_free, MAX_NEW_JOBS_PER_HOUR)
+
 #----------------------Use Workload Generator for Randomizer------------------------------------------------------------------------------------
             if self.workload_gen is not None:
-                jobs = self.workload_gen.sample(self.current_hour - 1, self.np_random)
+                if max_to_generate == 0:
+                    jobs = []
+                else:
+                    # Note: workload generator doesn't support max_jobs parameter yet
+                    # Consider adding this parameter to WorkloadGenConfig for consistency
+                    jobs = self.workload_gen.sample(self.current_hour - 1, self.np_random)
+                    jobs = jobs[:max_to_generate]  # Truncate if needed
                 new_jobs_count = len(jobs)
610-678: Consider returning drop count for consistency.

The function tracks num_dropped (line 612, incremented at line 667) but doesn't return it. Drop tracking happens through side effects (self.jobs_dropped, self.dropped_this_episode). While this works, returning (num_processed_jobs, num_dropped, next_empty_slot) would make the function's behavior more explicit and testable.

If you prefer explicit returns:
-    def assign_jobs_to_available_nodes(self, job_queue_2d, nodes, cores_available, running_jobs, next_empty_slot, is_baseline=False):
-        num_processed_jobs = 0
-        num_dropped = 0
+    def assign_jobs_to_available_nodes(self, job_queue_2d, nodes, cores_available, running_jobs, next_empty_slot, is_baseline=False):
+        num_processed_jobs = 0
+        num_dropped = 0
         
         # ... existing logic ...
         
-        return num_processed_jobs, next_empty_slot
+        return num_processed_jobs, num_dropped, next_empty_slot
Then update callers at lines 432 and 689 to unpack three values.

620-677: Excessive int() casts in job assignment loop.

Lines 628-631, 634-635, 649, 652, and 658 contain int() casts for values that are already numpy integers. These casts are unnecessary and add clutter. NumPy integer types work correctly in these contexts (indexing, arithmetic, dictionary keys).

Consider removing the unnecessary casts:
             for i in range(job_nodes):
-                node_idx = int(candidate_nodes[i])
-                cores_available[node_idx] -= int(job_cores_per_node)
-                nodes[node_idx] = max(int(nodes[node_idx]), int(job_duration))
-                job_allocation.append((node_idx, int(job_cores_per_node)))
+                node_idx = candidate_nodes[i]
+                cores_available[node_idx] -= job_cores_per_node
+                nodes[node_idx] = max(nodes[node_idx], job_duration)
+                job_allocation.append((node_idx, job_cores_per_node))

             running_jobs[self.next_job_id] = {
-                "duration": int(job_duration),
+                "duration": job_duration,
                 "allocation": job_allocation,
             }
And similar changes at lines 649, 652, 658.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c977c61 and e53b83a.

📒 Files selected for processing (5)

environment.py (19 hunks)
prices_deterministic.py (1 hunks)
test_determinism_workloadgen.py (1 hunks)
test_sanity_workloadgen.py (1 hunks)
train.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

test_determinism_workloadgen.py

🧰 Additional context used

📓 Path-based instructions (3)

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

🧠 Learnings (6)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py
train.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (2)

prices_deterministic.py (1)

environment.py (1)

reset (248-281)

test_sanity_workloadgen.py (1)

workloadgen.py (3)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

sample (99-110)

🪛 Ruff (0.14.8)

prices_deterministic.py

24-24: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)

test_sanity_workloadgen.py (1)

1-43: LGTM! Well-structured test suite.

The test suite effectively validates the workload generator's key properties:

Determinism via identical RNG seeds

Constraint adherence for job attributes

Statistical sanity for Poisson arrivals

prices_deterministic.py (1)

43-143: LGTM! Clean deterministic pricing implementation.

The Prices class provides a well-structured deterministic pricing model with clear state management, proper reset behavior, and helpful statistics/visualization utilities.

train.py (1)

49-51: LGTM! Clean workload generator integration.

The workload generator integration follows best practices:

CLI arguments provide flexible configuration

Bounds are sourced from environment constants (DRY principle)

Conditional instantiation based on user selection

Properly wired into environment initialization

Also applies to: 96-111, 133-134

environment.py (2)

151-155: LGTM! Proper deterministic RNG initialization.

The RNG is correctly initialized using Gymnasium's seeding utility, ensuring deterministic behavior when a seed is provided. This addresses the previous AttributeError issue.

Also applies to: 248-250

459-461: LGTM! Drop penalty logic is correct.

The drop penalty correctly penalizes incremental excess drops each step:

Calculates cumulative excess drops vs baseline

Computes new excess drops this step (delta)

Applies penalty proportional to new excess

The min(0, ...) clamp ensures the penalty is always non-positive.

Also applies to: 740-742

environment.py

train.py

environment.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

environment.py (1)

354-361: Missing queue capacity check for external_jobs path.

The external_hourly_jobs path (lines 364-375) and workload_gen path (lines 386-399) both enforce queue capacity limits before generating jobs. However, the external_jobs path does not apply any such limit, which could lead to generating more jobs than the queue can accept.

Consider adding a similar capacity check:

         if self.external_jobs:
-            jobs = self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
+            # How much room is left right now?
+            queue_used = int(np.count_nonzero(job_queue_2d[:, 0] > 0))
+            queue_free = max(0, MAX_QUEUE_SIZE - queue_used)
+            max_to_generate = min(queue_free, MAX_NEW_JOBS_PER_HOUR)
+            
+            if max_to_generate == 0:
+                jobs = []
+            else:
+                jobs = self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
+                if len(jobs) > max_to_generate:
+                    jobs = jobs[:max_to_generate]
             if len(jobs) > 0:

Based on past review discussion where this issue was acknowledged.

🧹 Nitpick comments (2)

train.py (1)

66-72: Consider using norm_path for prices_file_path earlier.

Per earlier discussion, norm_path is used here for the condition check but prices_file_path itself is passed to pd.read_csv without normalization. The current usage works correctly since the condition guards the read, but applying norm_path to prices_file_path at line 61 would be more consistent with the pattern used for other external paths.
environment.py (1)
311-327: Consolidate redundant counter initializations.

dropped_this_episode and baseline_dropped_this_episode are initialized twice: first at lines 185-189 in __init__ and again at lines 324-325 in reset_state(). The __init__ initialization is redundant since reset_state() is called from __init__ at line 192.

Consider removing the duplicate initialization in __init__ (lines 183-189) since reset_state() handles all counter resets:
-        # Job tracking metrics for agent
-        self.jobs_dropped = 0
-        self.dropped_this_episode = 0
-
-        # Job tracking metrics for baseline
-        self.baseline_jobs_dropped = 0
-        self.baseline_dropped_this_episode = 0
-

         self.reset_state()

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e53b83a and 9f873b4.

📒 Files selected for processing (2)

environment.py (19 hunks)
train.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (6)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (2)

train.py (2)

workloadgen.py (2)

WorkloadGenerator (56-110)

WorkloadGenConfig (35-53)

environment.py (2)

ComputeClusterEnv (66-912)

render (71-72)

environment.py (6)

prices_deterministic.py (3)

Prices (5-143)

reset (52-67)

advance_and_get_predicted_prices (88-92)

prices.py (1)

Prices (4-121)

plot.py (1)

plot (6-77)

sampler_jobs.py (5)

DurationSampler (7-444)

parse_jobs (18-115)

precalculate_hourly_jobs (364-383)

sample_one_hourly (416-444)

sample (123-168)

sampler_hourly.py (2)

parse_jobs (12-100)

sample (102-142)

workloadgen.py (1)

sample (99-110)

🔇 Additional comments (16)

train.py (4)

10-18: LGTM!

Clean imports for the new workload generator integration and environment constants needed for configuration.

21-24: LGTM!

Good defensive helper to normalize empty strings to None for consistent falsy handling.

55-57: LGTM!

CLI arguments for workload generator configuration are well-designed with sensible defaults matching WorkloadGenConfig.

148-169: LGTM!

Environment construction correctly passes workload_gen and uses norm_path for external data paths. The evaluation_mode and workload_gen parameters align with the updated ComputeClusterEnv signature.

environment.py (12)

44-46: LGTM!

Clear penalty constant for dropped jobs with an explicit negative value. This aligns with the reward system convention.

84-93: LGTM!

Useful debug validator for next_empty_slot invariants. The assertions check both that the slot is empty and that no holes exist before it.

117-118: LGTM!

Constructor properly accepts optional workload_gen and initializes RNG scaffolding (np_random, _seed) for deterministic seeding in reset().

Also applies to: 150-154

160-169: LGTM!

Jobs sampler initialization is correctly conditioned on not self.workload_gen, preventing redundant parsing when the workload generator is active.

247-252: LGTM!

RNG is now properly seeded in reset() using Gymnasium's seeding.np_random(seed). This addresses the previous critical issue where self.np_random was None.

384-421: LGTM!

Workload generator integration is well-structured:

Queue capacity is checked before sampling

Jobs are capped to available queue space

Legacy path correctly uses self.np_random instead of global RNG

471-473: LGTM!

The excess drop calculation correctly computes the incremental penalty only for new excess drops beyond baseline.

517-518: Clarify truncation vs termination semantics.

Setting truncated=True and terminated=False at episode end is a valid Gymnasium pattern for time-limit truncations. This correctly signals that the episode ended due to reaching the maximum steps rather than a terminal state, which is important for value bootstrapping in RL algorithms.

622-690: LGTM!

The assign_jobs_to_available_nodes method now correctly:

Tracks dropped jobs when they exceed MAX_JOB_AGE

Updates per-episode and cumulative drop counters

Properly updates next_empty_slot when clearing dropped jobs

Returns both num_processed_jobs and maintains the slot tracking

752-762: LGTM!

Drop penalty logic is now clean: min(0, PENALTY_DROPPED_JOB * self.new_excess) ensures the penalty is always non-positive. This addresses the previous review comment about dead code.

875-911: LGTM!

Episode completion data now includes comprehensive drop metrics for both agent and baseline, enabling proper analysis of job handling performance.

8-9: prices_deterministic module is properly integrated with compatible interface.

The prices_deterministic module exists and provides all required methods matching the environment.py usage: reset(), get_price_context(), advance_and_get_predicted_prices(), get_price_stats(), and the attributes MAX_PRICE and MIN_PRICE. The interface is fully compatible with the environment implementation.

train.py

…tes timely distributions of incoming jobs. Direct input into agent not yet finished, but pipelines mostly set. train.py has added parsers to call generator. Fixed: Missing deque import, correct "flat" distribution check, removed duplicate lines. ist Commit-Beschreibung FairRootGroup#2: Workload generator: Added new script, including sanity checks. Generates timely distributions of incoming jobs. Direct input into agent not yet finished, but pipelines mostly set. train.py has added parsers to call generator. Fixed: Missing deque import, correct "flat" distribution check, removed duplicate lines. Additional Fixes: Remove unused import, add queue capacity also for job randomizer, norm_path check now also for prices. Removed non-essential type casts. Hotfix Triplet, which occured during rebase.

environment.py

rbx · 2025-12-18T09:05:57Z

I've added 3 commits:

adding weights for the drop penalty. As long as it is still experimental, it should be possible to turn it off in the master. The weight is 0 by default (in train.py)
excluded the baseline from the drop penalty calculation. This will require further tweaking, as now penalty will be applied even if all nodes are busy with work. Maybe include number of turned off nodes in the calculation somehow, similar to the other penalty?
formatting - removed whitespaces at the end of lines

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

environment.py (1)

354-361: Apply queue capacity checks to the external_jobs path.

The external_hourly_jobs and workload_gen paths include queue capacity checks to prevent generating more jobs than can be enqueued (lines 365-375, 387-399). The external_jobs path should have the same protection to maintain consistency and prevent queue overflow.

This was previously noted by reviewer rbx.

🔎 Apply this diff to add queue capacity checks:

         if self.external_jobs:
+            # Check queue capacity before sampling
+            queue_used = int(np.count_nonzero(job_queue_2d[:, 0] > 0))
+            queue_free = max(0, MAX_QUEUE_SIZE - queue_used)
+            max_to_generate = min(queue_free, MAX_NEW_JOBS_PER_HOUR)
+            
+            if max_to_generate == 0:
+                jobs = []
+            else:
-            jobs = self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
+                jobs = self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
+                if len(jobs) > max_to_generate:
+                    jobs = jobs[:max_to_generate]
+            
             if len(jobs) > 0:
                 for job in jobs:

🧹 Nitpick comments (1)

environment.py (1)
8-9: Remove commented-out import.

The old import statement should be removed rather than commented out, as the code has been refactored to use the deterministic prices module.
🔎 Apply this diff to clean up the commented code:
-#from prices import Prices
 from prices_deterministic import Prices # Test re-worked prices script

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42a5563 and 64dbbf5.

📒 Files selected for processing (1)

environment.py (20 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (6)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (1)

environment.py (6)

prices_deterministic.py (3)

Prices (5-143)

reset (52-67)

advance_and_get_predicted_prices (88-92)

weights.py (1)

Weights (5-16)

sampler_jobs.py (3)

DurationSampler (7-444)

parse_jobs (18-115)

sample (123-168)

sampler_duration.py (1)

init (42-54)

sampler_hourly.py (2)

parse_jobs (12-100)

sample (102-142)

workloadgen.py (1)

sample (99-110)

🔇 Additional comments (19)

environment.py (19)

44-45: LGTM!

The explicit penalty constant for dropped jobs is well-defined and documented. The negative value correctly represents a penalty in the reward system.

85-93: LGTM!

The validation helper is well-structured for debugging queue state consistency. The assertions appropriately verify that next_empty points to an actual empty slot and that there are no holes before it.

117-118: LGTM!

The new parameters evaluation_mode and workload_gen extend the environment's capabilities to support deterministic workload generation and evaluation scenarios.

151-155: LGTM!

The deterministic RNG scaffolding is correctly initialized. The np_random is set to None here and properly seeded in the reset() method (line 249), which follows Gymnasium's seeding pattern.

160-169: LGTM!

The conditional loading of external jobs data correctly excludes the jobs_sampler when workload_gen is provided, preventing redundant initialization and ensuring the pluggable generator takes precedence.

183-189: LGTM!

Job drop tracking metrics are properly initialized for both agent and baseline, enabling comparison of drop behavior between the two policies.

247-261: LGTM!

The reset method properly implements Gymnasium's seeding pattern:

Calls super().reset(seed=seed) to handle parent class seeding

Seeds the deterministic RNG using seeding.np_random(seed)

Initializes predicted prices from the reset prices object

This ensures reproducibility across episodes.

311-327: LGTM!

The additional tracking metrics for job drops and rejections are properly initialized in reset_state(). These metrics enable comprehensive monitoring of job handling across episodes for both agent and baseline policies.

336-336: LGTM!

The use of advance_and_get_predicted_prices() aligns with the deterministic price handling refactor and ensures predicted prices are updated consistently each step.

365-375: LGTM!

The queue capacity checks before sampling prevent generating jobs that cannot be enqueued, improving robustness. The logic correctly:

Calculates available queue space

Caps generation at the minimum of queue space and MAX_NEW_JOBS_PER_HOUR

Handles the zero-capacity case by setting jobs to an empty list

385-421: LGTM!

The workload generator integration is well-implemented:

Queue capacity checks prevent overflow (lines 387-399)

Jobs are capped at max_to_generate to respect queue limits

Legacy path uses self.np_random for deterministic generation (lines 409, 414, 419-420)

Maintains backward compatibility with external durations

The deterministic RNG replacement across all random calls ensures reproducibility.

428-428: LGTM!

Tracking jobs_rejected_queue_full provides visibility into when the queue reaches capacity, which is useful for monitoring and tuning queue size limits.

444-444: Verify signature consistency in baseline_step.

The assign_jobs_to_available_nodes method now returns three values (num_launched_jobs, next_empty_slot, num_dropped). Ensure the baseline_step call (line 699) handles this correctly.

Based on line 699, the baseline call correctly unpacks three values with _ for the drop count, so this is consistent.

471-471: LGTM!

Storing num_dropped_this_step enables per-step drop penalties in the reward calculation.

515-516: LGTM!

Setting truncated=True and terminated=False correctly signals that the episode ended due to a time limit rather than a terminal state, which aligns with Gymnasium's truncation semantics for fixed-horizon episodes.

620-688: LGTM!

The job assignment logic now properly handles job drops:

Jobs exceeding MAX_JOB_AGE are cleared from the queue (lines 670-676)

Drop counters are correctly maintained for agent and baseline (lines 679-684)

The next_empty_slot is updated when jobs are dropped (lines 675-676)

Jobs that can't be assigned continue to age (line 686)

The three-value return signature (num_processed_jobs, next_empty_slot, num_dropped) is consistent with the call sites.

697-699: LGTM!

The baseline step correctly:

Tracks rejected jobs when the queue is full (line 697)

Unpacks the three-value return from assign_jobs_to_available_nodes (line 699)

The drop count is intentionally discarded with _ since baseline drop tracking is handled internally within assign_jobs_to_available_nodes.

751-764: LGTM!

The drop penalty implementation is clean and correct:

Uses min(0, PENALTY_DROPPED_JOB * self.num_dropped_this_step) to ensure the penalty is non-positive (line 752)

Weighted by drop_weight for configurability (line 753)

Penalizes only the agent's drops without dependency on baseline, addressing the previous concern about hidden baseline dependency

Integrated into the total reward calculation (line 761)

Based on learnings, this aligns with the weighted reward component architecture.

875-910: LGTM!

The episode completion recording properly captures drop metrics:

Drop rates are correctly calculated with zero-division protection (lines 875-876)

All relevant drop metrics are included in the episode data for both agent and baseline (lines 902-909)

Enables comprehensive analysis of job handling performance across episodes

Based on learnings, this extends the episode metrics structure appropriately.

Track episode/time counter in environment, pass it to prices function, and read the current price.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

prices_deterministic.py (1)

23-24: Consider extracting exception message to a constant.

Ruff suggests avoiding long exception messages directly in the raise statement (TRY003). While the current message is clear and relatively short, you could optionally extract it to a module-level constant for consistency with linting guidelines.
🔎 Optional refactor:

At module level (after imports):
_EMPTY_PRICES_ERROR = "external_prices must be a non-empty sequence"
Then update the raise statement:
             if prices.size == 0:
-                raise ValueError("external_prices must be a non-empty sequence")
+                raise ValueError(_EMPTY_PRICES_ERROR)

environment.py (3)

8-9: Remove commented-out import.

The old prices import is commented out while prices_deterministic is marked as a test. Either finalize the transition and remove the commented line, or clarify the intent with a TODO comment.

🔎 Apply this diff to clean up the import:

-#from prices import Prices
-from prices_deterministic import Prices # Test re-worked prices script
+from prices_deterministic import Prices

375-383: Consider adding queue capacity check to external_jobs path.

The external_hourly_jobs and workload_gen paths both include queue capacity checks to prevent generating more jobs than can be enqueued (lines 386-396, 407-420). The external_jobs path should also implement this check for consistency.

🔎 Apply this diff to add queue capacity check:

         if self.external_jobs:
-            jobs = self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
+            queue_used = int(np.count_nonzero(job_queue_2d[:, 0] > 0))
+            queue_free = max(0, MAX_QUEUE_SIZE - queue_used)
+            max_to_generate = min(queue_free, MAX_NEW_JOBS_PER_HOUR)
+            
+            if max_to_generate == 0:
+                jobs = []
+            else:
+                jobs = self.jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"]
+                if len(jobs) > max_to_generate:
+                    jobs = jobs[:max_to_generate]
+            
             if len(jobs) > 0:
                 for job in jobs:

428-442: Clean up commented legacy code.

The legacy np.random calls are commented out but retained. Since the migration to self.np_random is complete and functional, consider removing the commented lines to reduce clutter.

🔎 Apply this diff to remove commented code:

             else:
-               # new_jobs_count = np.random.randint(0, MAX_NEW_JOBS_PER_HOUR + 1)     # Keep legacy code for now
                 new_jobs_count = self.np_random.integers(0, MAX_NEW_JOBS_PER_HOUR + 1)
                 if self.external_durations:
                     new_jobs_durations = durations_sampler.sample(new_jobs_count)
                 else:
-                 #   new_jobs_durations = np.random.randint(1, MAX_JOB_DURATION + 1, size=new_jobs_count)
                     new_jobs_durations = self.np_random.integers(1, MAX_JOB_DURATION + 1, size=new_jobs_count)
                 # Generate random node and core requirements
                 for _ in range(new_jobs_count):
-                  #  new_jobs_nodes.append(np.random.randint(MIN_NODES_PER_JOB, MAX_NODES_PER_JOB + 1))
-                  #  new_jobs_cores.append(np.random.randint(MIN_CORES_PER_JOB, CORES_PER_NODE + 1))
                     new_jobs_nodes.append(self.np_random.integers(MIN_NODES_PER_JOB, MAX_NODES_PER_JOB + 1))
                     new_jobs_cores.append(self.np_random.integers(MIN_CORES_PER_JOB, CORES_PER_NODE + 1))
-#----------------------------------------------------------------------------------------------------------------------------------------------

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64dbbf5 and fa3895e.

📒 Files selected for processing (2)

environment.py (20 hunks)
prices_deterministic.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (2)

environment.py (4)

prices_deterministic.py (3)

Prices (5-152)

reset (52-76)

advance_and_get_predicted_prices (97-101)

sampler_jobs.py (4)

DurationSampler (7-444)

parse_jobs (18-115)

sample_one_hourly (416-444)

sample (123-168)

sampler_hourly.py (2)

parse_jobs (12-100)

sample (102-142)

workloadgen.py (1)

sample (99-110)

prices_deterministic.py (1)

environment.py (1)

reset (250-301)

🪛 Ruff (0.14.8)

prices_deterministic.py

24-24: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (8)

environment.py (8)

85-93: LGTM!

The debug validator is well-structured and provides useful assertions for maintaining queue invariants during development.

151-169: LGTM!

The RNG initialization and workload generator integration are correctly structured. The conditional logic properly ensures only one job generation method is active.

250-281: LGTM!

The deterministic RNG seeding and episode-based price window selection are well-implemented, ensuring reproducible training runs.

465-465: LGTM!

The drop tracking integration and proper use of truncated/terminated flags align with Gymnasium conventions and provide accurate metrics.

Also applies to: 492-492, 536-537

641-709: LGTM!

The drop tracking logic is well-implemented. Jobs that exceed MAX_JOB_AGE are properly cleared from the queue, and separate counters for agent and baseline are correctly maintained.

718-720: LGTM!

The baseline tracking correctly mirrors the agent path, and the updated assign_jobs signature is properly handled.

896-930: LGTM!

The episode recording comprehensively captures drop metrics for both agent and baseline, enabling detailed analysis of job handling performance.

772-774: Verify drop_weight exists in Weights class.

The code references self.weights.drop_weight which is properly defined in the Weights dataclass with a default value of 0.0.

environment.py

prices_deterministic.py

…ary import and initialization in env

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa3895e and 2bdbdc1.

📒 Files selected for processing (2)

environment.py (20 hunks)
prices_deterministic.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (2)

prices_deterministic.py (1)

environment.py (1)

reset (249-300)

environment.py (4)

prices_deterministic.py (3)

Prices (5-151)

reset (52-76)

advance_and_get_predicted_prices (96-100)

prices.py (1)

Prices (4-121)

sampler_jobs.py (5)

DurationSampler (7-444)

parse_jobs (18-115)

precalculate_hourly_jobs (364-383)

sample_one_hourly (416-444)

sample (123-168)

workloadgen.py (1)

sample (99-110)

🪛 Ruff (0.14.8)

prices_deterministic.py

24-24: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (27)

prices_deterministic.py (7)

1-10: LGTM!

The imports are complete (including deque from collections which addresses a previous concern), and the class constants are well-defined with sensible values for electricity price modeling.

12-41: Initialization logic is sound.

The empty array guard (line 23-24) and deque initialization (line 18) address previous review concerns. The price shifting logic correctly handles negative prices, and percentile-based bounds are properly computed.

The static analysis hint about TRY003 (line 24) is a style suggestion to define a custom exception class for long messages, but the current approach is acceptable for this context.

44-50: LGTM!

The synthetic price generation is deterministic and uses a sensible daily sinusoid pattern with a lower bound. The get_real_price method correctly reverses the price shift applied during initialization.

52-77: LGTM!

The reset method properly reinitializes the deque (line 58) and handles both external and synthetic price modes correctly. The external mode uses modulo arithmetic for wrapping and correctly positions the price_index after the prediction window.

79-100: LGTM!

The stateful stepping methods are well-designed. The get_next_price method correctly uses the bounded deque (which auto-removes the oldest entry when full), get_price_context safely handles empty history, and advance_and_get_predicted_prices returns a copy to prevent external mutation.

103-124: LGTM!

The non-mutating utilities correctly separate stats computation from internal state. The _generated_prices_for_stats method generates a full year of synthetic prices without side effects, and get_price_stats provides comprehensive statistics with proper handling of all three price modes.

126-151: LGTM!

The histogram plotting method correctly handles all price modes, provides clear visual markers for key statistics, and properly closes the figure after use. The conditional logic for determining the price type label is correct.

environment.py (20)

8-8: LGTM!

The import changes correctly switch to the deterministic variants (prices_deterministic.Prices and sampler_jobs.DurationSampler) and add the necessary seeding utility for deterministic RNG integration, aligning with the PR's objectives.

Also applies to: 12-12, 15-16

43-44: LGTM!

The PENALTY_DROPPED_JOB constant is correctly defined as a negative value to penalize dropped jobs in the reward calculation.

84-91: LGTM!

The _validate_next_empty debugging utility correctly validates the invariants for the next_empty slot pointer: that it points to an empty slot and that there are no holes before it.

117-117: LGTM!

The addition of the workload_gen parameter and initialization of deterministic RNG fields (np_random, _seed) align with the PR's goal of supporting deterministic workload generation.

Also applies to: 150-153

159-168: LGTM!

The conditional initialization correctly avoids redundant sampler setup when workload_gen is provided. The sampler is only created when external jobs are used and no workload generator is present, which is the correct logic.

182-191: LGTM!

The drop tracking metrics for both agent and baseline are properly initialized. The episode_idx = -1 initialization is a clean approach to ensure the first reset sets it to 0.

249-272: LGTM!

The reset method correctly seeds the deterministic RNG (line 251), tracks episode indices, and implements episode-based price windowing. The calculation start_index = (self.episode_idx * episode_span) % n_prices ensures each episode starts at a different point in the external price series with proper wrapping.

280-280: LGTM!

Using self.prices.predicted_prices.copy() correctly initializes the state with the deterministic price prediction window.

355-355: LGTM!

The change to advance_and_get_predicted_prices() correctly uses the deterministic prices module's method for advancing the price timeline.

374-374: LGTM!

The switch to sample_one_hourly(wrap=True) correctly uses the DurationSampler's API for retrieving hourly jobs.

383-394: LGTM!

The queue capacity check correctly prevents over-generation when the queue is full by computing available space and capping the maximum jobs to generate. Passing rng=self.np_random and max_jobs=max_to_generate to the sampler enables deterministic, bounded job generation.

404-440: LGTM!

The workload generator integration is well-designed:

Queue capacity checks prevent overflow

The generator is called with deterministic RNG (self.np_random)

Truncation handles cases where the generator produces more jobs than can be enqueued

Legacy path correctly replaces global np.random with self.np_random for deterministic behavior

447-447: LGTM!

The calculation correctly tracks jobs rejected due to a full queue by comparing the number of jobs generated to the number actually added.

463-463: LGTM!

The updated assign_jobs_to_available_nodes call correctly unpacks the new third return value (num_dropped_this_step), and storing it in self.num_dropped_this_step makes it available for the reward calculation.

Also applies to: 490-490

534-535: LGTM!

The change to truncated=True and terminated=False correctly follows Gymnasium semantics: episodes ending due to time limits should set truncated=True, while task completion or failure conditions set terminated=True.

639-707: LGTM!

The updated assign_jobs_to_available_nodes method correctly implements job dropping when age exceeds MAX_JOB_AGE. The logic:

Successfully assigned jobs are processed as before (lines 649-684)

Jobs that can't be assigned have their age incremented (line 687)

Jobs exceeding MAX_JOB_AGE are dropped, cleared from the queue, and tracked separately for agent vs baseline (lines 689-703)

Jobs not exceeding the age limit are aged in place (line 705)

The return signature change (line 707) correctly adds num_dropped as the third return value.

716-716: LGTM!

The baseline tracking correctly mirrors the agent's logic:

Line 716 tracks rejected jobs due to full queue

Line 718 correctly unpacks the updated return signature and ignores the launched/dropped counts (which are tracked separately via instance variables updated inside assign_jobs_to_available_nodes)

Also applies to: 718-718

780-780: LGTM!

The drop penalty is correctly added to the total reward (line 780), and the debug print statement (line 783) is updated to include the weighted drop penalty in the reward breakdown.

Also applies to: 783-783

894-928: LGTM!

The episode completion recording correctly calculates drop rates with zero-division guards and includes comprehensive drop metrics in the episode data dictionary. This enables proper tracking and analysis of job drop behavior across episodes.

770-772: The drop penalty calculation is correct, using min(0, PENALTY_DROPPED_JOB * self.num_dropped_this_step) to ensure the penalty is never positive. The drop_weight attribute is properly defined in the Weights class (as a float attribute) and is correctly referenced at line 772.

Likely an incorrect or invalid review comment.

environment.py

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bdbdc1 and 8dc77fd.

📒 Files selected for processing (1)

environment.py (20 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (1)

environment.py (4)

prices_deterministic.py (3)

Prices (5-151)

reset (52-76)

advance_and_get_predicted_prices (96-100)

weights.py (1)

Weights (5-16)

sampler_jobs.py (4)

DurationSampler (7-444)

parse_jobs (18-115)

precalculate_hourly_jobs (364-383)

sample (123-168)

sampler_hourly.py (2)

parse_jobs (12-100)

sample (102-142)

🔇 Additional comments (7)

environment.py (7)

1-44: Imports and constants look good.

The addition of deterministic RNG seeding and the PENALTY_DROPPED_JOB constant align well with the PR objectives.

249-300: RNG seeding and episode-aware price initialization implemented correctly.

The deterministic RNG is properly seeded using Gymnasium's seeding.np_random(), and the price series correctly wraps around based on episode index. This ensures reproducible episodes.

380-423: Capacity checks and deterministic RNG usage look good.

Both the hourly_sampler and workload_gen pathways correctly:

Calculate available queue capacity before generating jobs

Cap generation to prevent exceeding MAX_QUEUE_SIZE or MAX_NEW_JOBS_PER_HOUR

Use the instance-level self.np_random for deterministic sampling

This prevents queue overflow and ensures reproducibility.

533-534: Correct truncation semantics for episode termination.

Setting truncated=True and terminated=False at the episode time limit is the correct Gymnasium pattern for distinguishing time-based episode endings from goal-based termination.

638-706: Job assignment and drop logic implemented correctly.

The function properly:

Ages waiting jobs by 1 hour per step

Drops jobs exceeding MAX_JOB_AGE and tracks the count

Updates next_empty_slot for efficient queue management

Returns three values (launched, next_slot, dropped) as expected by callers

769-782: Drop penalty calculation is correct and addresses previous feedback.

The penalty correctly:

Uses only the agent's dropped jobs (not baseline) as requested by rbx

Applies PENALTY_DROPPED_JOB (-5.0) per dropped job

Uses min(0, ...) as defensive coding to ensure non-positive penalty

Is weighted by drop_weight (default 0, making it experimental)

Based on learnings, the default drop_weight is 0, which aligns with the PR description stating this feature is experimental and can be disabled.

893-928: Episode metrics tracking for dropped jobs implemented correctly.

The drop rate calculations use safe division, and all relevant metrics (dropped, rejected, rates) are recorded for both agent and baseline, enabling comprehensive episode analysis.

environment.py

…in init

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

environment.py (1)
256-259: Remove redundant hasattr check.

The episode_idx attribute is already initialized in __init__ (line 192), making the hasattr check unnecessary.
🔎 Apply this diff to simplify the logic:
-        # Track which episode this env instance is on
-        if not hasattr(self, "episode_idx"):
-            self.episode_idx = 0
-        else:
-            self.episode_idx += 1
+        # Track which episode this env instance is on
+        self.episode_idx += 1

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dc77fd and d4e7234.

📒 Files selected for processing (1)

environment.py (20 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

environment.py

📄 CodeRabbit inference engine (CLAUDE.md)

environment.py: Use Gymnasium-compatible environment design for RL simulations in the Environment module
Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Files:

environment.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_jobs.py : Use pattern-based replay of historical job batches with full characteristics in the Jobs Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

environment.py

🧬 Code graph analysis (1)

environment.py (4)

prices_deterministic.py (3)

Prices (5-151)

reset (52-76)

advance_and_get_predicted_prices (96-100)

prices.py (1)

Prices (4-121)

sampler_jobs.py (5)

DurationSampler (7-444)

parse_jobs (18-115)

precalculate_hourly_jobs (364-383)

sample_one_hourly (416-444)

sample (123-168)

workloadgen.py (1)

sample (99-110)

🔇 Additional comments (19)

environment.py (19)

8-16: LGTM!

The imports correctly integrate deterministic pricing and RNG seeding utilities, aligning with the PR's deterministic workflow objectives.

43-44: LGTM!

The explicit drop penalty constant is well-defined and aligns with the experimental feature described in the PR comments.

84-92: LGTM!

The validation helper correctly checks queue invariants for debugging purposes.

116-117: LGTM!

The new constructor parameters properly support evaluation mode and workload generator integration.

150-154: LGTM!

The RNG initialization is correct; np_random is properly seeded in the reset() method (lines 252-253).

159-168: LGTM!

The jobs_sampler initialization condition now correctly matches its usage in step() (line 374).

182-193: LGTM!

Job tracking metrics are properly initialized in __init__, addressing previous concerns about attribute availability.

282-282: LGTM!

The predicted prices initialization correctly uses the deterministic Prices object.

333-347: LGTM!

Job tracking metrics are properly reset without redundant initializations.

356-356: LGTM!

The predicted prices are correctly updated using the deterministic price advancement method.

374-441: LGTM!

Job generation properly implements queue capacity checks across all paths and uses deterministic RNG throughout.

448-448: LGTM!

The queue rejection tracking correctly captures jobs that couldn't be enqueued.

464-464: LGTM!

The call correctly unpacks the updated return values from assign_jobs_to_available_nodes.

491-491: LGTM!

The per-step drop count is correctly tracked for reward calculation.

535-536: LGTM!

The truncated/terminated distinction correctly follows Gymnasium conventions for time-limited episodes.

640-708: LGTM!

The job assignment and dropping logic is well-structured, correctly tracking drops for both agent and baseline.

717-719: LGTM!

Baseline step correctly handles the updated method signature.

771-784: LGTM!

The drop penalty calculation is clear and correctly ensures a non-positive penalty value.

895-929: LGTM!

Episode completion recording properly captures all drop-related metrics for both agent and baseline.

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

environment.py Outdated Show resolved Hide resolved

prices_deterministic.py Show resolved Hide resolved

enlorenz marked this pull request as draft December 8, 2025 16:44

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

environment.py Outdated Show resolved Hide resolved

environment.py Outdated Show resolved Hide resolved

environment.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

enlorenz marked this pull request as ready for review December 8, 2025 17:12

enlorenz added 2 commits December 15, 2025 16:15

Seeding for consistent randomizer: Seeds RNG once during reset, same …

3a28447

…RNG is used in any function that uses any type of RNG. Conserves determinism for training. Adjusted "sampler_hourly" to allow for this logic, and the way it's read in -> Now it's a "self." function.

enlorenz force-pushed the wip/integration branch from 8cfa7b9 to 0ce559a Compare December 15, 2025 15:30

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

enlorenz marked this pull request as draft December 15, 2025 15:38

enlorenz force-pushed the wip/integration branch from 0ce559a to ed14541 Compare December 16, 2025 10:36

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

prices_deterministic.py Show resolved Hide resolved

test_determinism_workloadgen.py Outdated Show resolved Hide resolved

train.py Outdated Show resolved Hide resolved

enlorenz force-pushed the wip/integration branch from ed14541 to 13a3b28 Compare December 16, 2025 11:02

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

deter_wlgen.py Outdated Show resolved Hide resolved

enlorenz force-pushed the wip/integration branch from 13a3b28 to c977c61 Compare December 16, 2025 11:16

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

enlorenz marked this pull request as ready for review December 16, 2025 13:21

enlorenz force-pushed the wip/integration branch from c977c61 to e53b83a Compare December 17, 2025 12:53

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

rbx reviewed Dec 17, 2025

View reviewed changes

environment.py Show resolved Hide resolved

rbx reviewed Dec 17, 2025

View reviewed changes

environment.py Outdated Show resolved Hide resolved

environment.py Outdated Show resolved Hide resolved

rbx reviewed Dec 17, 2025

View reviewed changes

train.py Show resolved Hide resolved

rbx reviewed Dec 17, 2025

View reviewed changes

environment.py Show resolved Hide resolved

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

train.py Show resolved Hide resolved

enlorenz marked this pull request as draft December 17, 2025 16:35

enlorenz force-pushed the wip/integration branch from 9f873b4 to 4c9e465 Compare December 17, 2025 16:41

enlorenz marked this pull request as ready for review December 17, 2025 16:46

rbx reviewed Dec 18, 2025

View reviewed changes

environment.py Outdated Show resolved Hide resolved

Add weight for the drop penalty

dd75be6

rbx added 2 commits December 18, 2025 13:54

Exclude baseline from reward components

15aac5d

formatting

64dbbf5

rbx force-pushed the wip/integration branch from 42a5563 to 64dbbf5 Compare December 18, 2025 12:54

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

prices_deterministic now does not reset price list after 2 weeks.

fa3895e

Track episode/time counter in environment, pass it to prices function, and read the current price.

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

environment.py Outdated Show resolved Hide resolved

prices_deterministic.py Show resolved Hide resolved

Fix: Implement bounded price history using deque. And removed unecess…

2bdbdc1

…ary import and initialization in env

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

environment.py Show resolved Hide resolved

Removed redundant initialzation

8dc77fd

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

environment.py Show resolved Hide resolved

environment.py Show resolved Hide resolved

Fixed jobs_sampler call in step function and "rejected" counters now …

d4e7234

…in init

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

rbx approved these changes Dec 18, 2025

View reviewed changes

rbx merged commit b18321c into FairRootGroup:master Dec 18, 2025
1 check passed

-        else:
-            raise ValueError(f"Unknown arrivals mode: {mode}")
+        elif mode == "uniform":
+            # This is the old "flat".
+            k = int(rng.integers(0, self.cfg.max_new_jobs_per_hour + 1))

Integration of early Dropped-Jobs, RNG Seeding and Workload generator & alternative "prices" script. #3

Integration of early Dropped-Jobs, RNG Seeding and Workload generator & alternative "prices" script. #3

Uh oh!

Conversation

enlorenz commented Dec 8, 2025

Uh oh!

coderabbitai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rbx commented Dec 18, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Dec 8, 2025 •

edited

Loading