Skip to content

data embedding run scripts#1300

Merged
bseidlit merged 4 commits intosPHENIX-Collaboration:masterfrom
bseidlit:master
Feb 26, 2026
Merged

data embedding run scripts#1300
bseidlit merged 4 commits intosPHENIX-Collaboration:masterfrom
bseidlit:master

Conversation

@bseidlit
Copy link
Contributor

@bseidlit bseidlit commented Feb 25, 2026

Data Embedding Run Scripts

Motivation / Context

Provide scripts and updated macros to orchestrate production-style embedding of simulated events into real sPHENIX DST data for reconstruction validation and systematic studies. Aim is to support distributed execution (HTCondor-like indexing) by preparing data slices, running embedding, and producing per-job outputs.

Key Changes

  • DataOverlayProduction/genList.sh
    • Adds a PostgreSQL FileCatalog query to produce test.list for a given segment (select filenames where segment=${1}, filename LIKE 'DST_TRIG%', runnumber=54404).
  • DataOverlayProduction/run.sh
    • New orchestration script that takes a job index (IDX), computes SEGMENT and NEVENTS_SKIP from NEVENT_GEN and NEVENTS_PREPARED_DATA, queries FileCatalog to create test.list for the computed segment, calls Fun4All_PrepDataFitting.C, reads prepData.txt, and runs Fun4All_G4_Embed_data.C to produce embed_test.root (with several hard-coded parameters).
  • DataOverlayProduction/Fun4All_PrepDataFitting.C
    • API changed from a single nEvents argument to signature Fun4All_PrepDataFitting(int nEvents_gen = 2, int nEvents_prepared_data = 4, int nEvents_skip = 0, const std::string& inlist = "test.list", ...).
    • Processing flow now uses tskim->set_accept_max(nEvents_gen), performs se->skip(nEvents_skip) then se->run(nEvents_prepared_data), and writes the prepared DST filename to prepData.txt.
    • Input file handling adjusted (se->fileopen usage) and overall run logic simplified.
  • DataOverlayProduction/Fun4All_G4_Embed_data.C
    • Swaps the input node ordering for CombineTowerInfo (changes pairing order between SEPD and EPD calibration inputs), affecting combined tower output.

Potential Risk Areas

  • IO and format assumptions
    • run.sh expects prepData.txt to exist and contain the prepared filename; reviewer noted prepData.txt may be missing if genList.sh or the invoked macro does not produce it — verify Fun4All_PrepDataFitting.C reliably writes prepData.txt and that run.sh reads the correct filename.
    • Hard-coded run number (54404) and filename patterns (DST_TRIG*) tie the scripts to a specific dataset and FileCatalog schema.
  • Reconstruction / physics behavior
    • Fun4All_PrepDataFitting API and internal behavior changed (accept_max, skip/run semantics, event counts): this will change which events are selected and how many are processed, with potential impacts on physics selection, QA, and downstream analysis.
    • CombineTowerInfo input swap can alter calorimeter tower combining for SEPD vs EPD and thus affect calorimeter-dependent observables (jets, energy sums).
  • Operational robustness & concurrency
    • Minimal error handling in run.sh and macros (no strict checks for DB query success, file existence, or macro failures).
    • Shared filenames (prepData.txt, test.list, embed_test.root) risk collisions in parallel jobs; use per-job working dirs or unique filenames.
  • Performance
    • NEVENTS_PREPARED_DATA = 2*NEVENT_GEN and segment math may increase IO; ensure this scales for large distributed runs.

Possible Future Improvements

  • Parameterize run.sh/genList.sh and macros (run number, NEVENT_GEN, DB filters, output names, CDB tags) via CLI flags or config files.
  • Add robust error checking (set -e, explicit return-code checks), validate existence/format of prepData.txt and test.list before proceeding.
  • Use unique per-job working directories or timestamped filenames to avoid collisions.
  • Add logging, dry-run mode, and preflight checks for DB query results.
  • Add tests or a small-data CI job to verify behavioral changes (especially CombineTowerInfo swap).
  • Document FileCatalog schema/expected filename patterns and the exact contents/format of prepData.txt.

Note: This summary was AI-assisted and may contain inaccuracies—please validate the scripts, the presence/format of prepData.txt, and macro parameter semantics with a test run and domain knowledge before large-scale deployment.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22297bd and 2788c3c.

📒 Files selected for processing (2)
  • DataOverlayProduction/Fun4All_PrepDataFitting.C
  • DataOverlayProduction/run.sh

📝 Walkthrough

Walkthrough

Adds DB-backed list generation and a run orchestration script, updates the prep-data macro to a multi-argument interface with explicit skip/run semantics, and swaps input nodes in the embedding macro's CombineTowerInfo instantiation.

Changes

Cohort / File(s) Summary
Workflow scripts
DataOverlayProduction/genList.sh, DataOverlayProduction/run.sh
Adds genList.sh DB query to produce test.list for a given segment/run and a new run.sh that computes segment/skip indices, generates test.list, runs Fun4All_PrepDataFitting.C, reads prepData.txt, and runs Fun4All_G4_Embed_data.C.
Prep-data macro
DataOverlayProduction/Fun4All_PrepDataFitting.C
Replaces single-arg interface with Fun4All_PrepDataFitting(int nEvents_gen, int nEvents_prepared_data, int nEvents_skip, ...). Sets skim accept max to nEvents_gen, calls se->skip(nEvents_skip) then se->run(nEvents_prepared_data), removes prior doubled-run/early-return logic, and adjusts file-open calls. Review callers for updated signature.
Embedding macro
DataOverlayProduction/Fun4All_G4_Embed_data.C
Swaps CombineTowerInfo input nodes: inputNodeA and inputNodeB now exchanged between TOWERINFO_CALIB_EPD and TOWERINFO_CALIB_SEPD_data, changing combine ordering; output/registration unchanged.

Possibly related PRs

  • Calo data overlay macro update #1288 — modifies the same embedding and prep-data macros (Fun4All_PrepDataFitting.C, Fun4All_G4_Embed_data.C), indicating overlapping interface and node-order changes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sphenix-jenkins-ci
Copy link

For repository maintainers, please start the CI check manually (feedback)

This is an automatic message to assist manually starting CI check for this pull request, commit d69b3fbeb94b9c96b3098340ef1b5b4d69c59c19. macros pull request require a manual start for CI checks, in particular selecting which coresoftware and calibrations versions to check against this macros pull request.

sPHENIX software maintainers: please make your input here and start the Build:

build

Note:

  1. if needed, fill in the pull request ID for the coresoftware pull request, e.g. origin/pr/1697/merge for PR#1697 in sha_coresoftware. Default is to check with the master branch.
  2. click Build button at the end of the long web page to start the test

Automatically generated by sPHENIX Jenkins continuous integration
sPHENIX             jenkins.io

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
DataOverlayProduction/run.sh (1)

8-8: Consider making BASEDIR configurable.

At Line [8], the path is hardcoded to a single user workspace. For collaboration/production jobs, prefer an env override (with current value as default) to keep the script portable.


ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9143e75 and d69b3fb.

📒 Files selected for processing (2)
  • DataOverlayProduction/genList.sh
  • DataOverlayProduction/run.sh

@@ -0,0 +1 @@
psql FileCatalog -t -A -c "select filename from datasets where segment=${1} and filename like 'DST_TRIG%' and runnumber=54404;" > test.list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

fd "genList.sh" --type f

Repository: sPHENIX-Collaboration/macros

Length of output: 104


🏁 Script executed:

head -20 DataOverlayProduction/genList.sh

Repository: sPHENIX-Collaboration/macros

Length of output: 212


Add an explicit shebang for deterministic shell behavior.

The file lacks a shebang, so the executing shell is environment-dependent. Add #!/usr/bin/env bash as the first line to ensure consistent behavior across different execution contexts.

🧰 Tools
🪛 Shellcheck (0.11.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)


⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

Script executed:

# First, locate the file in the repository
find . -name "genList.sh" -type f 2>/dev/null | head -20

Repository: sPHENIX-Collaboration/macros


Repository: sPHENIX-Collaboration/macros
Exit code: 0

stdout:

./DataOverlayProduction/genList.sh

Script executed:

# Also check if DataOverlayProduction directory exists
ls -la DataOverlayProduction/ 2>/dev/null || echo "Directory not found"

Repository: sPHENIX-Collaboration/macros


Repository: sPHENIX-Collaboration/macros
Exit code: 0

stdout:

total 44
drwxr-xr-x  2 jailuser jailuser   120 Feb 25 04:51 .
drwxr-xr-x 23 jailuser jailuser   540 Feb 25 04:51 ..
-rw-r--r--  1 jailuser jailuser 25838 Feb 25 04:51 Fun4All_G4_Embed_data.C
-rw-r--r--  1 jailuser jailuser  7324 Feb 25 04:51 Fun4All_PrepDataFitting.C
-rw-r--r--  1 jailuser jailuser   141 Feb 25 04:51 genList.sh
-rwxr-xr-x  1 jailuser jailuser   645 Feb 25 04:51 run.sh

Script executed:

# List files in repository root to understand structure
git ls-files | head -30

Repository: sPHENIX-Collaboration/macros


Repository: sPHENIX-Collaboration/macros
Exit code: 0

stdout:

.clang-format
.clang-tidy
.gitignore
BcoQA/Fun4All_Gl1QA.C
BcoQA/Fun4All_TpcQA.C
BcoQA/condor/Fun4All_Gl1QA.C
BcoQA/condor/Fun4All_InttQA.C
BcoQA/condor/Fun4All_TpcQA.C
BcoQA/condor/README
BcoQA/condor/condor.job
BcoQA/condor/create_condorlist.pl
BcoQA/condor/gl1_makelist.sh
BcoQA/condor/intt_makelist.sh
BcoQA/condor/make_list.sh
BcoQA/condor/mvtx_makelist.sh
BcoQA/condor/run_gl1qa.sh
BcoQA/condor/run_inttqa.sh
BcoQA/condor/run_tpcqa.sh
BcoQA/condor/tpc_makelist.sh
BcoQA/condor/tpot_makelist.sh
CDBTest/README
CDBTest/TestCDBHistos.C
CDBTest/TestCDBInsert.C
CDBTest/TestCDBRead.C
CDBTest/TestCDBSubsysReco.C
CDBTest/TestCDBTTree.C
CDBTest/mycdb.json
CaloEmbedding/Fun4All_G4_Embed_data.C
CaloEmbedding/G4Setup_sPHENIX.C
CaloProduction/CreateListFiles.pl

Script executed:

# Read the actual contents of genList.sh
cat -n DataOverlayProduction/genList.sh

Repository: sPHENIX-Collaboration/macros


Repository: sPHENIX-Collaboration/macros
Exit code: 0

stdout:

     1	psql FileCatalog -t -A -c  "select filename from datasets where segment=${1} and filename like 'DST_TRIG%' and runnumber=54404;" > test.list

Script executed:

# Verify psql variable syntax - check documentation or test behavior
# Let's check if there are other examples in the repo using psql variables
grep -r "\-v " --include="*.sh" . | head -10

Repository: sPHENIX-Collaboration/macros


Repository: sPHENIX-Collaboration/macros
Exit code: 0

stdout:

./calibrations/calo/emcal_calib_year1/base/genCalib.sh:   cp -v $calib .
./calibrations/calo/emcal_calib_year1/base/genCalib.sh:cp -v OUT*.root $submitDir
./calibrations/calo/emcal_calib_year1/base/run.sh:    cp -v "$PWD"/../../CondorRun.sh CondorRunJob$li.sh
./calibrations/calo/emcal_calib_year1/base/run.sh:    cp -v "$PWD"/../../local_calib_copy.root . 
./calibrations/calo/emcal_calib_year1/track_calib/condor/run_condor.sh:    cp -v "$PWD"/../../CondorRun.sh CondorRunJob$li.sh
./calibrations/calo/emcal_calib_year1/track_calib/condor/run_condor.sh:    cp -v "$PWD"/../../../Fun4All_EMCal.C .
./calibrations/calo/emcal_calib_year2/base/hadd_condor.sh:    cp -v "$PWD"/../../CondorHADD.sh CondorRunJob$li.sh
./calibrations/calo/hcal_calib_year2/condor_histmaker/hadd_condor.sh:    cp -v "$PWD"/../../CondorHADD.sh CondorHaddJob$li.sh
./calibrations/calo/hcal_tsc_cos/condor/run_condor.sh:    cp -v "$PWD"/../../CondorRun.sh CondorRunJob$li.sh
./calibrations/intt/old/shell.sh:	printf -v LIST ${INTT_LIST} ${RUN_NUM} ${FLX_SVR}

Script executed:

# Also search for psql usage patterns in the repository
grep -r "psql.*-c" --include="*.sh" . | head -20

Repository: sPHENIX-Collaboration/macros


Repository: sPHENIX-Collaboration/macros
Exit code: 0

stdout:

./DataOverlayProduction/genList.sh:psql FileCatalog -t -A -c  "select filename from datasets where segment=${1} and filename like 'DST_TRIG%' and runnumber=54404;" > test.list
./calibrations/mbd/SLEW/make_dst_uncal_noslew.sh:    echo  psql -h sphnxdaqdbreplica.sdcc.bnl.gov -Uphnxrc -w daq -c "select filename,firstevent,lastevent from filelist where filename ~ '$prdf';" 
./calibrations/mbd/SLEW/make_dst_uncal_noslew.sh:    exec psql -h sphnxdaqdbreplica.sdcc.bnl.gov -Uphnxrc -w daq -c "select filename,firstevent,lastevent from filelist where filename ~ '$prdf';" | grep $prdf > $tmpfile
./calibrations/mbd/get_files2calib.sh:psql -U phnxrc -h sphnxdaqdbreplica.sdcc.bnl.gov -c "select runnumber,eventsinrun,ertimestamp-brtimestamp from run where runnumber>= $min_runnumber and runnumber <= $max_runnumber and runtype='physics' and eventsinrun >= 100000 and EXTRACT(EPOCH FROM (ertimestamp-brtimestamp)) > 300 order by runnumber" daq | tail -n +3 | grep '|' | sort -n -k1 > $tmp_runsfile
./calibrations/mbd/get_files2calib.sh:psql -c 'select NOW();' FileCatalog | awk 'NR==3' | sed 's/^ //' > ${tstamp_file}
./calibrations/mbd/get_files2calib.sh:echo psql -c \"$pcmd\" FileCatalog
./calibrations/mbd/get_files2calib.sh:psql -c "$pcmd" FileCatalog > $tmpfile

Script executed:

# Search for psql documentation or verify the -v variable syntax
# Check if there are any psql examples using -v with variable substitution
grep -r "psql.*-v" --include="*.sh" . 2>/dev/null | head -10

Repository: sPHENIX-Collaboration/macros


Repository: sPHENIX-Collaboration/macros
Exit code: 0


Script executed:

# Check online documentation or search for psql variable substitution examples in scripts
# Let's also check if the colon syntax is used anywhere in the repository
grep -r ":[a-zA-Z_]" --include="*.sh" . 2>/dev/null | head -20

Repository: sPHENIX-Collaboration/macros


Repository: sPHENIX-Collaboration/macros
Exit code: 0

stdout:

./calibrations/mbd/run_mbdcal.sh:PARSED=$(getopt -o "r:b:d:p:n:h" -l "outbase:,outdir:,logbase:,logdir:,run:,build:,dbtag:,pass0dir:,nevents:,help" -- "$@")

psql -v variable syntax SQL query parameter substitution


psql variables are client-side text substitutions set via -v/--set (or \set) and referenced in SQL with colon syntax. (access.crunchydata.com)

Set a variable

psql -v myval=123 -f script.sql
# or inside psql:
\set myval 123

(caryrobbins.com)

Use it in SQL (3 forms)

  • Raw substitution (no quoting added):
    SELECT :myval;
  • SQL literal-quoted (safe for values; psql adds proper SQL string escaping):
    SELECT :'myval';
  • Identifier-quoted (for table/column names):
    SELECT * FROM :"mytable";

(access.crunchydata.com)

Notes:

-c limitation (common gotcha)

psql -c "..." only accepts input “parsable by the server” (or a single backslash command), so psql-specific features like :myval substitution won’t work there. Use -f or pipe via stdin instead: (stackoverflow.com)

echo "select * from t where id = :'myval';" | psql -v myval=123

Citations:


Validate segment parameter to prevent SQL injection.

The ${1} value is directly interpolated into SQL, creating a SQL injection risk. Input validation is essential. However, the proposed patch has a critical flaw: psql's -v variable substitution (:variable syntax) does not work with the -c flag—that feature only functions with script files (-f) or stdin input.

To fix this properly:

  • Validate ${1} as numeric: [[ "${1}" =~ ^[0-9]+$ ]] || exit 2
  • Pass the query via stdin instead of -c so variable substitution works:
    echo "select filename from datasets where segment = :segment and filename like 'DST_TRIG%' and runnumber = 54404;" | \
      psql FileCatalog -v ON_ERROR_STOP=1 -v segment="${1}" -t -A > test.list
🧰 Tools
🪛 Shellcheck (0.11.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

Comment on lines +20 to +24
FILE=$(cat prepData.txt)


# Run the Fun4All_G4_Embed_data.C script with the provided parameters
root -l -b -q "Fun4All_G4_Embed_data.C(${NEVENT},0,\"${FILE}\", \"./\" , \"embed_test.root\", \"Jet30\", \"MDC2\")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "run.sh" -path "*DataOverlayProduction*" | head -20

Repository: sPHENIX-Collaboration/macros

Length of output: 102


🏁 Script executed:

git ls-files | grep -i "dataoverlay" | head -20

Repository: sPHENIX-Collaboration/macros

Length of output: 227


🏁 Script executed:

git ls-files DataOverlayProduction/ 2>/dev/null || find . -type d -name "DataOverlayProduction" 2>/dev/null

Repository: sPHENIX-Collaboration/macros

Length of output: 227


🏁 Script executed:

wc -l DataOverlayProduction/run.sh

Repository: sPHENIX-Collaboration/macros

Length of output: 103


🏁 Script executed:

cat -n DataOverlayProduction/run.sh | sed -n '1,35p'

Repository: sPHENIX-Collaboration/macros

Length of output: 882


🏁 Script executed:

cat -n DataOverlayProduction/genList.sh

Repository: sPHENIX-Collaboration/macros

Length of output: 219


🏁 Script executed:

cat -n DataOverlayProduction/Fun4All_PrepDataFitting.C | head -50

Repository: sPHENIX-Collaboration/macros

Length of output: 1609


🏁 Script executed:

grep -r "prepData.txt" DataOverlayProduction/ --include="*.sh" --include="*.C"

Repository: sPHENIX-Collaboration/macros

Length of output: 214


🏁 Script executed:

cat -n DataOverlayProduction/Fun4All_PrepDataFitting.C | grep -A 30 "prepData.txt"

Repository: sPHENIX-Collaboration/macros

Length of output: 871


🏁 Script executed:

grep -E "return|exit" DataOverlayProduction/run.sh

Repository: sPHENIX-Collaboration/macros

Length of output: 54


🏁 Script executed:

head -n 20 DataOverlayProduction/run.sh | grep -E "set -"

Repository: sPHENIX-Collaboration/macros

Length of output: 54


🏁 Script executed:

cat -n DataOverlayProduction/Fun4All_PrepDataFitting.C | sed -n '235,250p'

Repository: sPHENIX-Collaboration/macros

Length of output: 564


Add validation before reading prepData.txt to fail fast on missing or empty files.

At lines 20–24, the script reads prepData.txt without checking whether it exists or contains data. If Fun4All_PrepDataFitting.C fails silently or produces incomplete output, the subsequent ROOT invocation receives an empty argument and fails with unclear error messages, complicating diagnostics in a production batch environment.

Proposed guard
-# get prepared data filename
-FILE=$(cat prepData.txt)
+# get prepared data filename
+[[ -s prepData.txt ]] || { echo "prepData.txt is missing or empty" >&2; exit 1; }
+read -r FILE < prepData.txt

@pinkenburg
Copy link
Contributor

run.sh contains this line:
FILE=$(cat prepData.txt)
There is no prepData.txt, If this file is the output of genList.sh, genList.sh needs to be changed to the proper filename

@sphenix-jenkins-ci
Copy link

For repository maintainers, please start the CI check manually (feedback)

This is an automatic message to assist manually starting CI check for this pull request, commit 22297bd4d9b9a305f1b004f0f151107b4d62e2b6. macros pull request require a manual start for CI checks, in particular selecting which coresoftware and calibrations versions to check against this macros pull request.

sPHENIX software maintainers: please make your input here and start the Build:

build

Note:

  1. if needed, fill in the pull request ID for the coresoftware pull request, e.g. origin/pr/1697/merge for PR#1697 in sha_coresoftware. Default is to check with the master branch.
  2. click Build button at the end of the long web page to start the test

Automatically generated by sPHENIX Jenkins continuous integration
sPHENIX             jenkins.io

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d69b3fb and 22297bd.

📒 Files selected for processing (3)
  • DataOverlayProduction/Fun4All_G4_Embed_data.C
  • DataOverlayProduction/Fun4All_PrepDataFitting.C
  • DataOverlayProduction/run.sh

Comment on lines +4 to +24
IDX="$1" # same as $(Process)
NEVENT_GEN=100
DATARUNNUMBER=54404

NEVENTS_PREPARED_DATA=$(( NEVENT_GEN * 2 )) #padding for possible skipping of events

echo "=== run.sh inputs ==="
echo " IDX (job index) = ${IDX}"
echo " NEVENT_GEN = ${NEVENT_GEN}"
echo " NEVENTS_PREPARED_DATA = ${NEVENTS_PREPARED_DATA}"
echo " DATARUNNUMBER = ${DATARUNNUMBER}"

# check number of events in a setment
NEVENTS_SEGMENT=$(psql FileCatalog -t -A -c "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER};")


#calculate which segment to run
SEGMENT=$(( ( IDX * NEVENTS_PREPARED_DATA ) / NEVENTS_SEGMENT))

# calculate how many events to skip
NEVENTS_SKIP=$(( ( IDX * NEVENTS_PREPARED_DATA ) % NEVENTS_SEGMENT))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate IDX and NEVENTS_SEGMENT before segment math.

Line 21 and Line 24 perform arithmetic on unchecked values. If DB output is empty/invalid/zero, this can fail or compute the wrong segment.

Suggested hardening
 #!/usr/bin/bash
+set -euo pipefail
 
 IDX="$1" # same as $(Process)
+[[ "${IDX}" =~ ^[0-9]+$ ]] || { echo "IDX must be a non-negative integer" >&2; exit 1; }
 NEVENT_GEN=100
 DATARUNNUMBER=54404
@@
-NEVENTS_SEGMENT=$(psql FileCatalog -t -A -c  "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER};")
+NEVENTS_SEGMENT=$(psql FileCatalog -v ON_ERROR_STOP=1 -t -A -c \
+  "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER} limit 1;")
+NEVENTS_SEGMENT="${NEVENTS_SEGMENT//[[:space:]]/}"
+[[ "${NEVENTS_SEGMENT}" =~ ^[1-9][0-9]*$ ]] || {
+  echo "Invalid NEVENTS_SEGMENT='${NEVENTS_SEGMENT}'" >&2
+  exit 1
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
IDX="$1" # same as $(Process)
NEVENT_GEN=100
DATARUNNUMBER=54404
NEVENTS_PREPARED_DATA=$(( NEVENT_GEN * 2 )) #padding for possible skipping of events
echo "=== run.sh inputs ==="
echo " IDX (job index) = ${IDX}"
echo " NEVENT_GEN = ${NEVENT_GEN}"
echo " NEVENTS_PREPARED_DATA = ${NEVENTS_PREPARED_DATA}"
echo " DATARUNNUMBER = ${DATARUNNUMBER}"
# check number of events in a setment
NEVENTS_SEGMENT=$(psql FileCatalog -t -A -c "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER};")
#calculate which segment to run
SEGMENT=$(( ( IDX * NEVENTS_PREPARED_DATA ) / NEVENTS_SEGMENT))
# calculate how many events to skip
NEVENTS_SKIP=$(( ( IDX * NEVENTS_PREPARED_DATA ) % NEVENTS_SEGMENT))
#!/usr/bin/bash
set -euo pipefail
IDX="$1" # same as $(Process)
[[ "${IDX}" =~ ^[0-9]+$ ]] || { echo "IDX must be a non-negative integer" >&2; exit 1; }
NEVENT_GEN=100
DATARUNNUMBER=54404
NEVENTS_PREPARED_DATA=$(( NEVENT_GEN * 2 )) `#padding` for possible skipping of events
echo "=== run.sh inputs ==="
echo " IDX (job index) = ${IDX}"
echo " NEVENT_GEN = ${NEVENT_GEN}"
echo " NEVENTS_PREPARED_DATA = ${NEVENTS_PREPARED_DATA}"
echo " DATARUNNUMBER = ${DATARUNNUMBER}"
# check number of events in a setment
NEVENTS_SEGMENT=$(psql FileCatalog -v ON_ERROR_STOP=1 -t -A -c \
"select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER} limit 1;")
NEVENTS_SEGMENT="${NEVENTS_SEGMENT//[[:space:]]/}"
[[ "${NEVENTS_SEGMENT}" =~ ^[1-9][0-9]*$ ]] || {
echo "Invalid NEVENTS_SEGMENT='${NEVENTS_SEGMENT}'" >&2
exit 1
}
`#calculate` which segment to run
SEGMENT=$(( ( IDX * NEVENTS_PREPARED_DATA ) / NEVENTS_SEGMENT))
# calculate how many events to skip
NEVENTS_SKIP=$(( ( IDX * NEVENTS_PREPARED_DATA ) % NEVENTS_SEGMENT))

Comment on lines +35 to +40
psql FileCatalog -t -A -c "select filename from datasets where segment=${SEGMENT} and filename like 'DST_TRIG%' and runnumber=${DATARUNNUMBER};" > test.list

# add the next segment to the list in case we go over the number of events in the segment
psql FileCatalog -t -A -c "select filename from datasets where segment=$(( SEGMENT + 1 )) and filename like 'DST_TRIG%' and runnumber=${DATARUNNUMBER};" >> test.list

root -l -b -q "Fun4All_PrepDataFitting.C(${NEVENT_GEN},${NEVENTS_PREPARED_DATA},${NEVENTS_SKIP},\"test.list\")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when test.list is empty.

If the queries return no files, the prep macro runs with no usable input and later failures are hard to diagnose.

Suggested guard
 psql FileCatalog -t -A -c  "select filename from datasets where segment=${SEGMENT} and filename like 'DST_TRIG%' and runnumber=${DATARUNNUMBER};" > test.list
@@
 psql FileCatalog -t -A -c  "select filename from datasets where segment=$(( SEGMENT + 1 )) and filename like 'DST_TRIG%' and runnumber=${DATARUNNUMBER};" >> test.list
+
+grep -q '[^[:space:]]' test.list || {
+  echo "No input files found for run=${DATARUNNUMBER}, segment=${SEGMENT}" >&2
+  exit 1
+}

@sphenix-jenkins-ci
Copy link

For repository maintainers, please start the CI check manually (feedback)

This is an automatic message to assist manually starting CI check for this pull request, commit 2788c3cc8ec417a749501c861b35d4b6f93f8311. macros pull request require a manual start for CI checks, in particular selecting which coresoftware and calibrations versions to check against this macros pull request.

sPHENIX software maintainers: please make your input here and start the Build:

build

Note:

  1. if needed, fill in the pull request ID for the coresoftware pull request, e.g. origin/pr/1697/merge for PR#1697 in sha_coresoftware. Default is to check with the master branch.
  2. click Build button at the end of the long web page to start the test

Automatically generated by sPHENIX Jenkins continuous integration
sPHENIX             jenkins.io

@bseidlit bseidlit merged commit 427c287 into sPHENIX-Collaboration:master Feb 26, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants