Conversation
reader() needed to allow writing while also being read. writer() was not allowing 'Overwrite = False' to function correctly when fixed_size was False
packs/ana/analysis_utils.py
Outdated
| if verbose > 1: | ||
| plt.plot(time, wf_data) | ||
| plt.xlabel('Time (ns)') | ||
| plt.ylabel('ADCs') | ||
| plt.yscale('log') | ||
| plt.axhline(second_peak, 0, 2e5, c = 'r', ls = '--') | ||
| plt.axvline(WINDOW_END, c = 'r', ls = '--') | ||
| plt.title(f'Event {event_number} subtracted waveform') | ||
| plt.show() | ||
| print(f'Event {event_number} excluded due to large secondary peak') |
There was a problem hiding this comment.
The docstring suggests that putting verbose = 0 will omit the print statement, but this isn't the case in the code.
I dislike the use of a verbosity argument, but there isn't a good alternative implemented within MULE at the moment (logging to be implemented in the future).
| if os.path.exists(filepath): | ||
| print(f"Processing file: {filepath}") | ||
|
|
||
| x = io.load_rwf_info(filepath, samples=2) |
There was a problem hiding this comment.
This is the part you were avoiding I see :)
This can be addressed in a future PR if you like (that you dont necessarily have to do)
| # Process the data in chunks to avoid memory overload, cooks data in chunks also | ||
| for start_idx in range(0, total_waveforms, chunk_size): | ||
| end_idx = min(start_idx + chunk_size, total_waveforms) | ||
| waveform_chunk = waveforms[start_idx:end_idx] |
There was a problem hiding this comment.
Lazy loading would avoid this roughness, although this may be a quicker method!
packs/configs/average_waveform.conf
Outdated
| [required] | ||
|
|
||
| files = ['run19.h5'] | ||
|
|
||
| window_args = {'WINDOW_START' : 4e2, | ||
| 'WINDOW_END' : 3e4, | ||
| 'BASELINE_POINT_1' : 1e6, | ||
| 'BASELINE_POINT_2' : 1.5e6, | ||
| 'BASELINE_RANGE_1' : 40e3, | ||
| 'BASELINE_RANGE_2' : 40e3} | ||
|
|
||
| bin_size = 4 | ||
| chunk_size = 5 | ||
| negative = True | ||
| baseline_mode = 'median' | ||
| verbose = 1 | ||
| peak_threshold = 1000 | ||
|
|
||
| save_path = 'test.csv' No newline at end of file |
There was a problem hiding this comment.
Would any of these be considered optional? You could set the save_path to be optional if it wrote out to a h5 file, and it would be stored within the same h5 from which it takes the data.
There was a problem hiding this comment.
It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?
There was a problem hiding this comment.
It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?
For internal consistency within MULE, writing to a h5 would be the nicest method. Writing it with the writer() would be about 5 lines of code. This can be pushed to a later PR though.
jwaiton
left a comment
There was a problem hiding this comment.
Small comments mostly, related to typos and other things.
The tests need a bit more thought, but we can figure that out together at some point.
| if os.path.exists(filepath): | ||
| print(f"Processing file: {filepath}") | ||
|
|
||
| x = io.load_rwf_info(filepath, samples=2) |
There was a problem hiding this comment.
This is the part you were avoiding I see :)
This can be addressed in a future PR if you like (that you dont necessarily have to do)
packs/tests/avgtests.py
Outdated
|
|
||
| # Construct waveforms that are likely to be rejected | ||
| # (huge secondary peaks) | ||
| waveforms = np.full((n_waveforms, n_samples), 1e6) |
There was a problem hiding this comment.
this doesn not create a secondary peak, it fills the whole sample with one value (no variation)
two functions which check that window args dont overlap or exceed waveform length. Also comment changes made for clarity.
dec6950 to
d8b0034
Compare
packs/core/waveform_utils.py
Outdated
|
|
||
| case _: | ||
| raise ValueError( | ||
| f"Invalid sub_type '{sub_type}'. Expected 'mean' or 'median'." |
There was a problem hiding this comment.
As you've added the new case none, you should include that in the options :)
packs/tests/avgtests.py
Outdated
| expected = np.mean(waveforms, axis=0) | ||
| np.testing.assert_allclose(avg, expected, rtol=1e-6) | ||
|
|
||
| def test_empty_chunk_handling(): #Tests for when a whole chunk gets rejected by cook data |
There was a problem hiding this comment.
you removed the comment, but I think it was a good one
| def test_wf_window_mismatch(tmp_path): # checks that skipping mismatch works, wf1 should be ignored in cook | ||
| wf1 = np.linspace(0,10,10).reshape(1, -1) | ||
| wf2 = np.linspace(0,30,30).reshape(1, -1) | ||
| f1 = make_temp_h5(tmp_path, wf1) | ||
| f2 = make_temp_h5(tmp_path, wf2, "test_waveforms2.h5") | ||
|
|
||
| window_args = { | ||
| "WINDOW_START": 1, | ||
| "WINDOW_END": 5, | ||
| "BASELINE_POINT_1": 10, | ||
| "BASELINE_POINT_2": 15, | ||
| "BASELINE_RANGE_1": 2, | ||
| "BASELINE_RANGE_2": 2, | ||
| } | ||
|
|
||
| x = average_waveforms( | ||
| files= [f1,f2], | ||
| bin_size=1, | ||
| window_args=window_args, | ||
| chunk_size=1, | ||
| negative=False, | ||
| baseline_mode="median", | ||
| verbose=0, | ||
| peak_threshold=1000, # very high so no rejection | ||
| suppression_threshold=0, # nothing should be suppressed | ||
| ) | ||
| y = average_waveforms( |
There was a problem hiding this comment.
Is the standard behaviour to ignore waveforms that are rejected? If so, is there a way to allow this to force the averaging to stop/output something to the terminal?
There was a problem hiding this comment.
I will make it print something in the terminal
packs/configs/average_waveform.conf
Outdated
| [required] | ||
|
|
||
| files = ['run19.h5'] | ||
|
|
||
| window_args = {'WINDOW_START' : 4e2, | ||
| 'WINDOW_END' : 3e4, | ||
| 'BASELINE_POINT_1' : 1e6, | ||
| 'BASELINE_POINT_2' : 1.5e6, | ||
| 'BASELINE_RANGE_1' : 40e3, | ||
| 'BASELINE_RANGE_2' : 40e3} | ||
|
|
||
| bin_size = 4 | ||
| chunk_size = 5 | ||
| negative = True | ||
| baseline_mode = 'median' | ||
| verbose = 1 | ||
| peak_threshold = 1000 | ||
|
|
||
| save_path = 'test.csv' No newline at end of file |
There was a problem hiding this comment.
It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?
For internal consistency within MULE, writing to a h5 would be the nicest method. Writing it with the writer() would be about 5 lines of code. This can be pushed to a later PR though.
packs/ana/analysis_utils.py
Outdated
| ) | ||
| ''' | ||
|
|
||
| event_number = i + chunk_size * chunk_number |
There was a problem hiding this comment.
where does i come from here? This line is repeated just a few lines down, Im assuming its not meant to be here?
This still applies, perhaps I'm missing something here.
Includes waveform averaging code in the form of ana.py, moves some functions around for cleanliness.