data embedding run scripts#1300
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds DB-backed list generation and a run orchestration script, updates the prep-data macro to a multi-argument interface with explicit skip/run semantics, and swaps input nodes in the embedding macro's CombineTowerInfo instantiation. Changes
Possibly related PRs
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
For repository maintainers, please start the CI check manually (feedback)This is an automatic message to assist manually starting CI check for this pull request, commit d69b3fbeb94b9c96b3098340ef1b5b4d69c59c19. sPHENIX software maintainers: please make your input here and start the Build: Note:
Automatically generated by sPHENIX Jenkins continuous integration |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
DataOverlayProduction/run.sh (1)
8-8: Consider makingBASEDIRconfigurable.At Line [8], the path is hardcoded to a single user workspace. For collaboration/production jobs, prefer an env override (with current value as default) to keep the script portable.
| @@ -0,0 +1 @@ | |||
| psql FileCatalog -t -A -c "select filename from datasets where segment=${1} and filename like 'DST_TRIG%' and runnumber=54404;" > test.list | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd "genList.sh" --type fRepository: sPHENIX-Collaboration/macros
Length of output: 104
🏁 Script executed:
head -20 DataOverlayProduction/genList.shRepository: sPHENIX-Collaboration/macros
Length of output: 212
Add an explicit shebang for deterministic shell behavior.
The file lacks a shebang, so the executing shell is environment-dependent. Add #!/usr/bin/env bash as the first line to ensure consistent behavior across different execution contexts.
🧰 Tools
🪛 Shellcheck (0.11.0)
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
❓ Verification inconclusive
Script executed:
# First, locate the file in the repository
find . -name "genList.sh" -type f 2>/dev/null | head -20Repository: sPHENIX-Collaboration/macros
Repository: sPHENIX-Collaboration/macros
Exit code: 0
stdout:
./DataOverlayProduction/genList.sh
Script executed:
# Also check if DataOverlayProduction directory exists
ls -la DataOverlayProduction/ 2>/dev/null || echo "Directory not found"Repository: sPHENIX-Collaboration/macros
Repository: sPHENIX-Collaboration/macros
Exit code: 0
stdout:
total 44
drwxr-xr-x 2 jailuser jailuser 120 Feb 25 04:51 .
drwxr-xr-x 23 jailuser jailuser 540 Feb 25 04:51 ..
-rw-r--r-- 1 jailuser jailuser 25838 Feb 25 04:51 Fun4All_G4_Embed_data.C
-rw-r--r-- 1 jailuser jailuser 7324 Feb 25 04:51 Fun4All_PrepDataFitting.C
-rw-r--r-- 1 jailuser jailuser 141 Feb 25 04:51 genList.sh
-rwxr-xr-x 1 jailuser jailuser 645 Feb 25 04:51 run.sh
Script executed:
# List files in repository root to understand structure
git ls-files | head -30Repository: sPHENIX-Collaboration/macros
Repository: sPHENIX-Collaboration/macros
Exit code: 0
stdout:
.clang-format
.clang-tidy
.gitignore
BcoQA/Fun4All_Gl1QA.C
BcoQA/Fun4All_TpcQA.C
BcoQA/condor/Fun4All_Gl1QA.C
BcoQA/condor/Fun4All_InttQA.C
BcoQA/condor/Fun4All_TpcQA.C
BcoQA/condor/README
BcoQA/condor/condor.job
BcoQA/condor/create_condorlist.pl
BcoQA/condor/gl1_makelist.sh
BcoQA/condor/intt_makelist.sh
BcoQA/condor/make_list.sh
BcoQA/condor/mvtx_makelist.sh
BcoQA/condor/run_gl1qa.sh
BcoQA/condor/run_inttqa.sh
BcoQA/condor/run_tpcqa.sh
BcoQA/condor/tpc_makelist.sh
BcoQA/condor/tpot_makelist.sh
CDBTest/README
CDBTest/TestCDBHistos.C
CDBTest/TestCDBInsert.C
CDBTest/TestCDBRead.C
CDBTest/TestCDBSubsysReco.C
CDBTest/TestCDBTTree.C
CDBTest/mycdb.json
CaloEmbedding/Fun4All_G4_Embed_data.C
CaloEmbedding/G4Setup_sPHENIX.C
CaloProduction/CreateListFiles.pl
Script executed:
# Read the actual contents of genList.sh
cat -n DataOverlayProduction/genList.shRepository: sPHENIX-Collaboration/macros
Repository: sPHENIX-Collaboration/macros
Exit code: 0
stdout:
1 psql FileCatalog -t -A -c "select filename from datasets where segment=${1} and filename like 'DST_TRIG%' and runnumber=54404;" > test.list
Script executed:
# Verify psql variable syntax - check documentation or test behavior
# Let's check if there are other examples in the repo using psql variables
grep -r "\-v " --include="*.sh" . | head -10Repository: sPHENIX-Collaboration/macros
Repository: sPHENIX-Collaboration/macros
Exit code: 0
stdout:
./calibrations/calo/emcal_calib_year1/base/genCalib.sh: cp -v $calib .
./calibrations/calo/emcal_calib_year1/base/genCalib.sh:cp -v OUT*.root $submitDir
./calibrations/calo/emcal_calib_year1/base/run.sh: cp -v "$PWD"/../../CondorRun.sh CondorRunJob$li.sh
./calibrations/calo/emcal_calib_year1/base/run.sh: cp -v "$PWD"/../../local_calib_copy.root .
./calibrations/calo/emcal_calib_year1/track_calib/condor/run_condor.sh: cp -v "$PWD"/../../CondorRun.sh CondorRunJob$li.sh
./calibrations/calo/emcal_calib_year1/track_calib/condor/run_condor.sh: cp -v "$PWD"/../../../Fun4All_EMCal.C .
./calibrations/calo/emcal_calib_year2/base/hadd_condor.sh: cp -v "$PWD"/../../CondorHADD.sh CondorRunJob$li.sh
./calibrations/calo/hcal_calib_year2/condor_histmaker/hadd_condor.sh: cp -v "$PWD"/../../CondorHADD.sh CondorHaddJob$li.sh
./calibrations/calo/hcal_tsc_cos/condor/run_condor.sh: cp -v "$PWD"/../../CondorRun.sh CondorRunJob$li.sh
./calibrations/intt/old/shell.sh: printf -v LIST ${INTT_LIST} ${RUN_NUM} ${FLX_SVR}
Script executed:
# Also search for psql usage patterns in the repository
grep -r "psql.*-c" --include="*.sh" . | head -20Repository: sPHENIX-Collaboration/macros
Repository: sPHENIX-Collaboration/macros
Exit code: 0
stdout:
./DataOverlayProduction/genList.sh:psql FileCatalog -t -A -c "select filename from datasets where segment=${1} and filename like 'DST_TRIG%' and runnumber=54404;" > test.list
./calibrations/mbd/SLEW/make_dst_uncal_noslew.sh: echo psql -h sphnxdaqdbreplica.sdcc.bnl.gov -Uphnxrc -w daq -c "select filename,firstevent,lastevent from filelist where filename ~ '$prdf';"
./calibrations/mbd/SLEW/make_dst_uncal_noslew.sh: exec psql -h sphnxdaqdbreplica.sdcc.bnl.gov -Uphnxrc -w daq -c "select filename,firstevent,lastevent from filelist where filename ~ '$prdf';" | grep $prdf > $tmpfile
./calibrations/mbd/get_files2calib.sh:psql -U phnxrc -h sphnxdaqdbreplica.sdcc.bnl.gov -c "select runnumber,eventsinrun,ertimestamp-brtimestamp from run where runnumber>= $min_runnumber and runnumber <= $max_runnumber and runtype='physics' and eventsinrun >= 100000 and EXTRACT(EPOCH FROM (ertimestamp-brtimestamp)) > 300 order by runnumber" daq | tail -n +3 | grep '|' | sort -n -k1 > $tmp_runsfile
./calibrations/mbd/get_files2calib.sh:psql -c 'select NOW();' FileCatalog | awk 'NR==3' | sed 's/^ //' > ${tstamp_file}
./calibrations/mbd/get_files2calib.sh:echo psql -c \"$pcmd\" FileCatalog
./calibrations/mbd/get_files2calib.sh:psql -c "$pcmd" FileCatalog > $tmpfile
Script executed:
# Search for psql documentation or verify the -v variable syntax
# Check if there are any psql examples using -v with variable substitution
grep -r "psql.*-v" --include="*.sh" . 2>/dev/null | head -10Repository: sPHENIX-Collaboration/macros
Repository: sPHENIX-Collaboration/macros
Exit code: 0
Script executed:
# Check online documentation or search for psql variable substitution examples in scripts
# Let's also check if the colon syntax is used anywhere in the repository
grep -r ":[a-zA-Z_]" --include="*.sh" . 2>/dev/null | head -20Repository: sPHENIX-Collaboration/macros
Repository: sPHENIX-Collaboration/macros
Exit code: 0
stdout:
./calibrations/mbd/run_mbdcal.sh:PARSED=$(getopt -o "r:b:d:p:n:h" -l "outbase:,outdir:,logbase:,logdir:,run:,build:,dbtag:,pass0dir:,nevents:,help" -- "$@")
psql -v variable syntax SQL query parameter substitution
psql variables are client-side text substitutions set via -v/--set (or \set) and referenced in SQL with colon syntax. (access.crunchydata.com)
Set a variable
psql -v myval=123 -f script.sql
# or inside psql:
\set myval 123Use it in SQL (3 forms)
- Raw substitution (no quoting added):
SELECT :myval; - SQL literal-quoted (safe for values;
psqladds proper SQL string escaping):SELECT :'myval';
- Identifier-quoted (for table/column names):
SELECT * FROM :"mytable";
Notes:
- Substitution does not happen inside SQL quoted strings (so
':myval'will stay literal text). (access.crunchydata.com) - If needed, escape a colon as
\:to prevent substitution. (access.crunchydata.com)
-c limitation (common gotcha)
psql -c "..." only accepts input “parsable by the server” (or a single backslash command), so psql-specific features like :myval substitution won’t work there. Use -f or pipe via stdin instead: (stackoverflow.com)
echo "select * from t where id = :'myval';" | psql -v myval=123Citations:
- 1: https://access.crunchydata.com/documentation/postgresql11/11.22/app-psql.html?utm_source=openai
- 2: https://caryrobbins.com/dev/postgres-scripting/?utm_source=openai
- 3: https://access.crunchydata.com/documentation/postgresql11/11.22/app-psql.html?utm_source=openai
- 4: https://access.crunchydata.com/documentation/postgresql11/11.22/app-psql.html?utm_source=openai
- 5: https://access.crunchydata.com/documentation/postgresql11/11.22/app-psql.html?utm_source=openai
- 6: https://stackoverflow.com/questions/45193279/using-variables-when-executing-single-command-in-psql
Validate segment parameter to prevent SQL injection.
The ${1} value is directly interpolated into SQL, creating a SQL injection risk. Input validation is essential. However, the proposed patch has a critical flaw: psql's -v variable substitution (:variable syntax) does not work with the -c flag—that feature only functions with script files (-f) or stdin input.
To fix this properly:
- Validate
${1}as numeric:[[ "${1}" =~ ^[0-9]+$ ]] || exit 2 - Pass the query via stdin instead of
-cso variable substitution works:echo "select filename from datasets where segment = :segment and filename like 'DST_TRIG%' and runnumber = 54404;" | \ psql FileCatalog -v ON_ERROR_STOP=1 -v segment="${1}" -t -A > test.list
🧰 Tools
🪛 Shellcheck (0.11.0)
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
DataOverlayProduction/run.sh
Outdated
| FILE=$(cat prepData.txt) | ||
|
|
||
|
|
||
| # Run the Fun4All_G4_Embed_data.C script with the provided parameters | ||
| root -l -b -q "Fun4All_G4_Embed_data.C(${NEVENT},0,\"${FILE}\", \"./\" , \"embed_test.root\", \"Jet30\", \"MDC2\")" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "run.sh" -path "*DataOverlayProduction*" | head -20Repository: sPHENIX-Collaboration/macros
Length of output: 102
🏁 Script executed:
git ls-files | grep -i "dataoverlay" | head -20Repository: sPHENIX-Collaboration/macros
Length of output: 227
🏁 Script executed:
git ls-files DataOverlayProduction/ 2>/dev/null || find . -type d -name "DataOverlayProduction" 2>/dev/nullRepository: sPHENIX-Collaboration/macros
Length of output: 227
🏁 Script executed:
wc -l DataOverlayProduction/run.shRepository: sPHENIX-Collaboration/macros
Length of output: 103
🏁 Script executed:
cat -n DataOverlayProduction/run.sh | sed -n '1,35p'Repository: sPHENIX-Collaboration/macros
Length of output: 882
🏁 Script executed:
cat -n DataOverlayProduction/genList.shRepository: sPHENIX-Collaboration/macros
Length of output: 219
🏁 Script executed:
cat -n DataOverlayProduction/Fun4All_PrepDataFitting.C | head -50Repository: sPHENIX-Collaboration/macros
Length of output: 1609
🏁 Script executed:
grep -r "prepData.txt" DataOverlayProduction/ --include="*.sh" --include="*.C"Repository: sPHENIX-Collaboration/macros
Length of output: 214
🏁 Script executed:
cat -n DataOverlayProduction/Fun4All_PrepDataFitting.C | grep -A 30 "prepData.txt"Repository: sPHENIX-Collaboration/macros
Length of output: 871
🏁 Script executed:
grep -E "return|exit" DataOverlayProduction/run.shRepository: sPHENIX-Collaboration/macros
Length of output: 54
🏁 Script executed:
head -n 20 DataOverlayProduction/run.sh | grep -E "set -"Repository: sPHENIX-Collaboration/macros
Length of output: 54
🏁 Script executed:
cat -n DataOverlayProduction/Fun4All_PrepDataFitting.C | sed -n '235,250p'Repository: sPHENIX-Collaboration/macros
Length of output: 564
Add validation before reading prepData.txt to fail fast on missing or empty files.
At lines 20–24, the script reads prepData.txt without checking whether it exists or contains data. If Fun4All_PrepDataFitting.C fails silently or produces incomplete output, the subsequent ROOT invocation receives an empty argument and fails with unclear error messages, complicating diagnostics in a production batch environment.
Proposed guard
-# get prepared data filename
-FILE=$(cat prepData.txt)
+# get prepared data filename
+[[ -s prepData.txt ]] || { echo "prepData.txt is missing or empty" >&2; exit 1; }
+read -r FILE < prepData.txt|
run.sh contains this line: |
For repository maintainers, please start the CI check manually (feedback)This is an automatic message to assist manually starting CI check for this pull request, commit 22297bd4d9b9a305f1b004f0f151107b4d62e2b6. sPHENIX software maintainers: please make your input here and start the Build: Note:
Automatically generated by sPHENIX Jenkins continuous integration |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
DataOverlayProduction/Fun4All_G4_Embed_data.CDataOverlayProduction/Fun4All_PrepDataFitting.CDataOverlayProduction/run.sh
| IDX="$1" # same as $(Process) | ||
| NEVENT_GEN=100 | ||
| DATARUNNUMBER=54404 | ||
|
|
||
| NEVENTS_PREPARED_DATA=$(( NEVENT_GEN * 2 )) #padding for possible skipping of events | ||
|
|
||
| echo "=== run.sh inputs ===" | ||
| echo " IDX (job index) = ${IDX}" | ||
| echo " NEVENT_GEN = ${NEVENT_GEN}" | ||
| echo " NEVENTS_PREPARED_DATA = ${NEVENTS_PREPARED_DATA}" | ||
| echo " DATARUNNUMBER = ${DATARUNNUMBER}" | ||
|
|
||
| # check number of events in a setment | ||
| NEVENTS_SEGMENT=$(psql FileCatalog -t -A -c "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER};") | ||
|
|
||
|
|
||
| #calculate which segment to run | ||
| SEGMENT=$(( ( IDX * NEVENTS_PREPARED_DATA ) / NEVENTS_SEGMENT)) | ||
|
|
||
| # calculate how many events to skip | ||
| NEVENTS_SKIP=$(( ( IDX * NEVENTS_PREPARED_DATA ) % NEVENTS_SEGMENT)) |
There was a problem hiding this comment.
Validate IDX and NEVENTS_SEGMENT before segment math.
Line 21 and Line 24 perform arithmetic on unchecked values. If DB output is empty/invalid/zero, this can fail or compute the wrong segment.
Suggested hardening
#!/usr/bin/bash
+set -euo pipefail
IDX="$1" # same as $(Process)
+[[ "${IDX}" =~ ^[0-9]+$ ]] || { echo "IDX must be a non-negative integer" >&2; exit 1; }
NEVENT_GEN=100
DATARUNNUMBER=54404
@@
-NEVENTS_SEGMENT=$(psql FileCatalog -t -A -c "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER};")
+NEVENTS_SEGMENT=$(psql FileCatalog -v ON_ERROR_STOP=1 -t -A -c \
+ "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER} limit 1;")
+NEVENTS_SEGMENT="${NEVENTS_SEGMENT//[[:space:]]/}"
+[[ "${NEVENTS_SEGMENT}" =~ ^[1-9][0-9]*$ ]] || {
+ echo "Invalid NEVENTS_SEGMENT='${NEVENTS_SEGMENT}'" >&2
+ exit 1
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| IDX="$1" # same as $(Process) | |
| NEVENT_GEN=100 | |
| DATARUNNUMBER=54404 | |
| NEVENTS_PREPARED_DATA=$(( NEVENT_GEN * 2 )) #padding for possible skipping of events | |
| echo "=== run.sh inputs ===" | |
| echo " IDX (job index) = ${IDX}" | |
| echo " NEVENT_GEN = ${NEVENT_GEN}" | |
| echo " NEVENTS_PREPARED_DATA = ${NEVENTS_PREPARED_DATA}" | |
| echo " DATARUNNUMBER = ${DATARUNNUMBER}" | |
| # check number of events in a setment | |
| NEVENTS_SEGMENT=$(psql FileCatalog -t -A -c "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER};") | |
| #calculate which segment to run | |
| SEGMENT=$(( ( IDX * NEVENTS_PREPARED_DATA ) / NEVENTS_SEGMENT)) | |
| # calculate how many events to skip | |
| NEVENTS_SKIP=$(( ( IDX * NEVENTS_PREPARED_DATA ) % NEVENTS_SEGMENT)) | |
| #!/usr/bin/bash | |
| set -euo pipefail | |
| IDX="$1" # same as $(Process) | |
| [[ "${IDX}" =~ ^[0-9]+$ ]] || { echo "IDX must be a non-negative integer" >&2; exit 1; } | |
| NEVENT_GEN=100 | |
| DATARUNNUMBER=54404 | |
| NEVENTS_PREPARED_DATA=$(( NEVENT_GEN * 2 )) `#padding` for possible skipping of events | |
| echo "=== run.sh inputs ===" | |
| echo " IDX (job index) = ${IDX}" | |
| echo " NEVENT_GEN = ${NEVENT_GEN}" | |
| echo " NEVENTS_PREPARED_DATA = ${NEVENTS_PREPARED_DATA}" | |
| echo " DATARUNNUMBER = ${DATARUNNUMBER}" | |
| # check number of events in a setment | |
| NEVENTS_SEGMENT=$(psql FileCatalog -v ON_ERROR_STOP=1 -t -A -c \ | |
| "select events from datasets where segment=0 and filename like 'DST_TRIGGERED_EVENT_seb07%' and runnumber=${DATARUNNUMBER} limit 1;") | |
| NEVENTS_SEGMENT="${NEVENTS_SEGMENT//[[:space:]]/}" | |
| [[ "${NEVENTS_SEGMENT}" =~ ^[1-9][0-9]*$ ]] || { | |
| echo "Invalid NEVENTS_SEGMENT='${NEVENTS_SEGMENT}'" >&2 | |
| exit 1 | |
| } | |
| `#calculate` which segment to run | |
| SEGMENT=$(( ( IDX * NEVENTS_PREPARED_DATA ) / NEVENTS_SEGMENT)) | |
| # calculate how many events to skip | |
| NEVENTS_SKIP=$(( ( IDX * NEVENTS_PREPARED_DATA ) % NEVENTS_SEGMENT)) |
| psql FileCatalog -t -A -c "select filename from datasets where segment=${SEGMENT} and filename like 'DST_TRIG%' and runnumber=${DATARUNNUMBER};" > test.list | ||
|
|
||
| # add the next segment to the list in case we go over the number of events in the segment | ||
| psql FileCatalog -t -A -c "select filename from datasets where segment=$(( SEGMENT + 1 )) and filename like 'DST_TRIG%' and runnumber=${DATARUNNUMBER};" >> test.list | ||
|
|
||
| root -l -b -q "Fun4All_PrepDataFitting.C(${NEVENT_GEN},${NEVENTS_PREPARED_DATA},${NEVENTS_SKIP},\"test.list\")" |
There was a problem hiding this comment.
Fail fast when test.list is empty.
If the queries return no files, the prep macro runs with no usable input and later failures are hard to diagnose.
Suggested guard
psql FileCatalog -t -A -c "select filename from datasets where segment=${SEGMENT} and filename like 'DST_TRIG%' and runnumber=${DATARUNNUMBER};" > test.list
@@
psql FileCatalog -t -A -c "select filename from datasets where segment=$(( SEGMENT + 1 )) and filename like 'DST_TRIG%' and runnumber=${DATARUNNUMBER};" >> test.list
+
+grep -q '[^[:space:]]' test.list || {
+ echo "No input files found for run=${DATARUNNUMBER}, segment=${SEGMENT}" >&2
+ exit 1
+}
For repository maintainers, please start the CI check manually (feedback)This is an automatic message to assist manually starting CI check for this pull request, commit 2788c3cc8ec417a749501c861b35d4b6f93f8311. sPHENIX software maintainers: please make your input here and start the Build: Note:
Automatically generated by sPHENIX Jenkins continuous integration |


Data Embedding Run Scripts
Motivation / Context
Provide scripts and updated macros to orchestrate production-style embedding of simulated events into real sPHENIX DST data for reconstruction validation and systematic studies. Aim is to support distributed execution (HTCondor-like indexing) by preparing data slices, running embedding, and producing per-job outputs.
Key Changes
Potential Risk Areas
Possible Future Improvements
Note: This summary was AI-assisted and may contain inaccuracies—please validate the scripts, the presence/format of prepData.txt, and macro parameter semantics with a test run and domain knowledge before large-scale deployment.