Skip to content

Add analysis component#44

Open
Tedsmith100 wants to merge 39 commits intonu-ZOO:mainfrom
Tedsmith100:add-waveform-averager
Open

Add analysis component#44
Tedsmith100 wants to merge 39 commits intonu-ZOO:mainfrom
Tedsmith100:add-waveform-averager

Conversation

@Tedsmith100
Copy link

Includes waveform averaging code in the form of ana.py, moves some functions around for cleanliness.

Copy link
Member

@jwaiton jwaiton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! It's a good function. I've added mostly some documentation requests and some changes to the IO. After those have been resolved, I'll move onto requesting tests (we can discuss these in person at some point 😸 )

Comment on lines 40 to 49
if verbose > 1:
plt.plot(time, wf_data)
plt.xlabel('Time (ns)')
plt.ylabel('ADCs')
plt.yscale('log')
plt.axhline(second_peak, 0, 2e5, c = 'r', ls = '--')
plt.axvline(WINDOW_END, c = 'r', ls = '--')
plt.title(f'Event {event_number} subtracted waveform')
plt.show()
print(f'Event {event_number} excluded due to large secondary peak')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring suggests that putting verbose = 0 will omit the print statement, but this isn't the case in the code.

I dislike the use of a verbosity argument, but there isn't a good alternative implemented within MULE at the moment (logging to be implemented in the future).

if os.path.exists(filepath):
print(f"Processing file: {filepath}")

x = io.load_rwf_info(filepath, samples=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its preferable to load the data in lazily, you can use the appropriate functions (found here), and an example of it being used here.

It would also be good to then modify cook_data() to instead expect singular waveforms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part you were avoiding I see :)

This can be addressed in a future PR if you like (that you dont necessarily have to do)

Comment on lines +197 to +200
# Process the data in chunks to avoid memory overload, cooks data in chunks also
for start_idx in range(0, total_waveforms, chunk_size):
end_idx = min(start_idx + chunk_size, total_waveforms)
waveform_chunk = waveforms[start_idx:end_idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy loading would avoid this roughness, although this may be a quicker method!

Comment on lines 1 to 19
[required]

files = ['run19.h5']

window_args = {'WINDOW_START' : 4e2,
'WINDOW_END' : 3e4,
'BASELINE_POINT_1' : 1e6,
'BASELINE_POINT_2' : 1.5e6,
'BASELINE_RANGE_1' : 40e3,
'BASELINE_RANGE_2' : 40e3}

bin_size = 4
chunk_size = 5
negative = True
baseline_mode = 'median'
verbose = 1
peak_threshold = 1000

save_path = 'test.csv' No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would any of these be considered optional? You could set the save_path to be optional if it wrote out to a h5 file, and it would be stored within the same h5 from which it takes the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still applies :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?

Copy link
Member

@jwaiton jwaiton Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?

For internal consistency within MULE, writing to a h5 would be the nicest method. Writing it with the writer() would be about 5 lines of code. This can be pushed to a later PR though.

@Tedsmith100 Tedsmith100 requested a review from a team February 3, 2026 17:23
Copy link
Member

@jwaiton jwaiton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments mostly, related to typos and other things.

The tests need a bit more thought, but we can figure that out together at some point.

if os.path.exists(filepath):
print(f"Processing file: {filepath}")

x = io.load_rwf_info(filepath, samples=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part you were avoiding I see :)

This can be addressed in a future PR if you like (that you dont necessarily have to do)


# Construct waveforms that are likely to be rejected
# (huge secondary peaks)
waveforms = np.full((n_waveforms, n_samples), 1e6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn not create a secondary peak, it fills the whole sample with one value (no variation)

two functions which check that window args dont overlap or exceed
waveform length. Also comment changes made for clarity.
@Tedsmith100 Tedsmith100 force-pushed the add-waveform-averager branch from dec6950 to d8b0034 Compare February 9, 2026 15:29

case _:
raise ValueError(
f"Invalid sub_type '{sub_type}'. Expected 'mean' or 'median'."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've added the new case none, you should include that in the options :)

expected = np.mean(waveforms, axis=0)
np.testing.assert_allclose(avg, expected, rtol=1e-6)

def test_empty_chunk_handling(): #Tests for when a whole chunk gets rejected by cook data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you removed the comment, but I think it was a good one

Comment on lines 308 to 334
def test_wf_window_mismatch(tmp_path): # checks that skipping mismatch works, wf1 should be ignored in cook
wf1 = np.linspace(0,10,10).reshape(1, -1)
wf2 = np.linspace(0,30,30).reshape(1, -1)
f1 = make_temp_h5(tmp_path, wf1)
f2 = make_temp_h5(tmp_path, wf2, "test_waveforms2.h5")

window_args = {
"WINDOW_START": 1,
"WINDOW_END": 5,
"BASELINE_POINT_1": 10,
"BASELINE_POINT_2": 15,
"BASELINE_RANGE_1": 2,
"BASELINE_RANGE_2": 2,
}

x = average_waveforms(
files= [f1,f2],
bin_size=1,
window_args=window_args,
chunk_size=1,
negative=False,
baseline_mode="median",
verbose=0,
peak_threshold=1000, # very high so no rejection
suppression_threshold=0, # nothing should be suppressed
)
y = average_waveforms(
Copy link
Member

@jwaiton jwaiton Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the standard behaviour to ignore waveforms that are rejected? If so, is there a way to allow this to force the averaging to stop/output something to the terminal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make it print something in the terminal

Comment on lines 1 to 19
[required]

files = ['run19.h5']

window_args = {'WINDOW_START' : 4e2,
'WINDOW_END' : 3e4,
'BASELINE_POINT_1' : 1e6,
'BASELINE_POINT_2' : 1.5e6,
'BASELINE_RANGE_1' : 40e3,
'BASELINE_RANGE_2' : 40e3}

bin_size = 4
chunk_size = 5
negative = True
baseline_mode = 'median'
verbose = 1
peak_threshold = 1000

save_path = 'test.csv' No newline at end of file
Copy link
Member

@jwaiton jwaiton Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?

For internal consistency within MULE, writing to a h5 would be the nicest method. Writing it with the writer() would be about 5 lines of code. This can be pushed to a later PR though.

)
'''

event_number = i + chunk_size * chunk_number
Copy link
Member

@jwaiton jwaiton Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does i come from here? This line is repeated just a few lines down, Im assuming its not meant to be here?

This still applies, perhaps I'm missing something here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants