Add analysis component by Tedsmith100 · Pull Request #44 · nu-ZOO/MULE

Tedsmith100 · 2025-10-08T10:55:31Z

Includes waveform averaging code in the form of ana.py, moves some functions around for cleanliness.

reader() needed to allow writing while also being read. writer() was not allowing 'Overwrite = False' to function correctly when fixed_size was False

jwaiton

Nice PR! It's a good function. I've added mostly some documentation requests and some changes to the IO. After those have been resolved, I'll move onto requesting tests (we can discuss these in person at some point 😸 )

packs/ana/analysis_utils.py

jwaiton · 2025-10-17T16:19:59Z

packs/ana/analysis_utils.py

+        if verbose > 1: 
+            plt.plot(time, wf_data)
+            plt.xlabel('Time (ns)')
+            plt.ylabel('ADCs')
+            plt.yscale('log')
+            plt.axhline(second_peak, 0, 2e5, c = 'r', ls = '--')
+            plt.axvline(WINDOW_END, c = 'r', ls = '--')
+            plt.title(f'Event {event_number} subtracted waveform')
+            plt.show()
+        print(f'Event {event_number} excluded due to large secondary peak')


The docstring suggests that putting verbose = 0 will omit the print statement, but this isn't the case in the code.

I dislike the use of a verbosity argument, but there isn't a good alternative implemented within MULE at the moment (logging to be implemented in the future).

packs/ana/analysis_utils.py

jwaiton · 2025-10-17T16:55:27Z

packs/ana/analysis_utils.py

+        if os.path.exists(filepath):
+            print(f"Processing file: {filepath}")
+
+            x = io.load_rwf_info(filepath, samples=2)


Its preferable to load the data in lazily, you can use the appropriate functions (found here), and an example of it being used here.

It would also be good to then modify cook_data() to instead expect singular waveforms.

This is the part you were avoiding I see :)

This can be addressed in a future PR if you like (that you dont necessarily have to do)

jwaiton · 2025-10-17T16:55:55Z

packs/ana/analysis_utils.py

+            # Process the data in chunks to avoid memory overload, cooks data in chunks also
+            for start_idx in range(0, total_waveforms, chunk_size):
+                end_idx = min(start_idx + chunk_size, total_waveforms)
+                waveform_chunk = waveforms[start_idx:end_idx]


Lazy loading would avoid this roughness, although this may be a quicker method!

jwaiton · 2025-10-17T16:57:08Z

packs/configs/average_waveform.conf

+[required]
+
+files = ['run19.h5']
+
+window_args = {'WINDOW_START'     : 4e2,
+    'WINDOW_END'       : 3e4,
+    'BASELINE_POINT_1' : 1e6,
+    'BASELINE_POINT_2' :  1.5e6,
+    'BASELINE_RANGE_1'  : 40e3,
+    'BASELINE_RANGE_2'  : 40e3}
+
+bin_size = 4 
+chunk_size = 5 
+negative = True 
+baseline_mode = 'median'
+verbose = 1 
+peak_threshold = 1000
+
+save_path = 'test.csv'


Would any of these be considered optional? You could set the save_path to be optional if it wrote out to a h5 file, and it would be stored within the same h5 from which it takes the data.

This still applies :)

It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?

It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?

For internal consistency within MULE, writing to a h5 would be the nicest method. Writing it with the writer() would be about 5 lines of code. This can be pushed to a later PR though.

jwaiton

Small comments mostly, related to typos and other things.

The tests need a bit more thought, but we can figure that out together at some point.

packs/ana/analysis_utils.py

jwaiton · 2026-02-06T12:24:08Z

packs/ana/analysis_utils.py

+        if os.path.exists(filepath):
+            print(f"Processing file: {filepath}")
+
+            x = io.load_rwf_info(filepath, samples=2)


This is the part you were avoiding I see :)

This can be addressed in a future PR if you like (that you dont necessarily have to do)

packs/tests/avgtests.py

jwaiton · 2026-02-06T12:50:45Z

packs/tests/avgtests.py

+
+    # Construct waveforms that are likely to be rejected
+    # (huge secondary peaks)
+    waveforms = np.full((n_waveforms, n_samples), 1e6)


this doesn not create a secondary peak, it fills the whole sample with one value (no variation)

two functions which check that window args dont overlap or exceed waveform length. Also comment changes made for clarity.

jwaiton · 2026-02-10T14:39:09Z

packs/core/waveform_utils.py


+        case _:
+            raise ValueError(
+                f"Invalid sub_type '{sub_type}'. Expected 'mean' or 'median'."


As you've added the new case none, you should include that in the options :)

jwaiton · 2026-02-10T15:04:02Z

packs/tests/avgtests.py

+    expected = np.mean(waveforms, axis=0)
+    np.testing.assert_allclose(avg, expected, rtol=1e-6)

-def test_empty_chunk_handling(): #Tests for when a whole chunk gets rejected by cook data


you removed the comment, but I think it was a good one

jwaiton · 2026-02-10T15:07:07Z

packs/tests/avgtests.py

+def test_wf_window_mismatch(tmp_path): # checks that skipping mismatch works, wf1 should be ignored in cook
+    wf1 = np.linspace(0,10,10).reshape(1, -1)
+    wf2 = np.linspace(0,30,30).reshape(1, -1)
+    f1 = make_temp_h5(tmp_path, wf1)
+    f2 = make_temp_h5(tmp_path, wf2, "test_waveforms2.h5")

+    window_args = {
+        "WINDOW_START": 1,
+        "WINDOW_END": 5,
+        "BASELINE_POINT_1": 10,
+        "BASELINE_POINT_2": 15,
+        "BASELINE_RANGE_1": 2,
+        "BASELINE_RANGE_2": 2,
+    }
+
+    x = average_waveforms(
+        files= [f1,f2],
+        bin_size=1,
+        window_args=window_args,
+        chunk_size=1,
+        negative=False,
+        baseline_mode="median",
+        verbose=0,
+        peak_threshold=1000,        # very high so no rejection
+        suppression_threshold=0,    # nothing should be suppressed
+    )
+    y = average_waveforms(


Is the standard behaviour to ignore waveforms that are rejected? If so, is there a way to allow this to force the averaging to stop/output something to the terminal?

I will make it print something in the terminal

jwaiton · 2026-02-15T10:49:33Z

packs/configs/average_waveform.conf

+[required]
+
+files = ['run19.h5']
+
+window_args = {'WINDOW_START'     : 4e2,
+    'WINDOW_END'       : 3e4,
+    'BASELINE_POINT_1' : 1e6,
+    'BASELINE_POINT_2' :  1.5e6,
+    'BASELINE_RANGE_1'  : 40e3,
+    'BASELINE_RANGE_2'  : 40e3}
+
+bin_size = 4 
+chunk_size = 5 
+negative = True 
+baseline_mode = 'median'
+verbose = 1 
+peak_threshold = 1000
+
+save_path = 'test.csv'


It's difficult to write to the same h5 that the data comes from as we can average over multiple h5 file inputs. The output is just a single waveform so I would assume that using an h5 file is overkill, is there a good file type you would recommend?

For internal consistency within MULE, writing to a h5 would be the nicest method. Writing it with the writer() would be about 5 lines of code. This can be pushed to a later PR though.

jwaiton · 2026-02-15T10:50:32Z

packs/ana/analysis_utils.py

+        )
+    '''
+
+    event_number = i + chunk_size * chunk_number


where does i come from here? This line is repeated just a few lines down, Im assuming its not meant to be here?

This still applies, perhaps I'm missing something here.

jwaiton and others added 23 commits April 16, 2025 18:39

include missing config check and test

3171663

add lazy reader and writer

dafdd4c

add reader and writer test

d67b88e

add WD1 rwf type

a68cf2e

add MalformedHeaderError

7f062af

add lazy WD1 processing

ea71fc8

include test for process_event_lazy_WD1

8627d35

add WD1 processing

c0b9f39

add tests

27d632d

include calibration config

914a0f6

alter reader and writer to fix bugs

b059786

reader() needed to allow writing while also being read. writer() was not allowing 'Overwrite = False' to function correctly when fixed_size was False

include calibration info type

8b6ca5c

include functions related to waveform processing

2ea8290

include charge and height calculations into processing

e0ea1d2

add backwards compatibility with chunked data

d804343

alter logic for fixed_size, decreasing runtime by half

4cf4c90

cosmetics

81588db

rearrange scripts for reusability

86a56b4

include a waveform averaging function

cda5ba2

add inclusion of window args as a dictionary

8496aad

add a config for analysis

34beda7

add ana function for analysis

1c5984f

edit mule to include ana

88184a3

jwaiton requested changes Oct 17, 2025

View reviewed changes

Tedsmith100 added 6 commits October 23, 2025 16:10

Add argument types

6bbec1e

Add arg types, make PR changes

dfde694

add tests

59dcce7

clean functions

f99cf6b

add supression thershold to config

8e0c600

write tests for averaging functions

02c1975

fix problems with averaging function

83ff8bf

Tedsmith100 requested a review from a team February 3, 2026 17:23

jwaiton requested changes Feb 6, 2026

View reviewed changes

Tedsmith100 added 4 commits February 9, 2026 15:24

add waveform window argument checks and implement

f466822

two functions which check that window args dont overlap or exceed waveform length. Also comment changes made for clarity.

add raise error for wrong baseline mode

f7e6d07

edit tests and add window arg tests

e7e84e9

add window args check

d8b0034

Tedsmith100 force-pushed the add-waveform-averager branch from dec6950 to d8b0034 Compare February 9, 2026 15:29

jwaiton requested changes Feb 15, 2026

View reviewed changes

Tedsmith100 added 5 commits February 16, 2026 11:56

change avgwf save to h5 with no overwriting

e233e8d

change grammar error and add comment for wf sum

5a96ac9

change save path to h5 in config example

7d24235

include none in options for bl subtraction type

9a2143f

Add description to empty chunk test

ee5302c

Conversation

Tedsmith100 commented Oct 8, 2025

Uh oh!

jwaiton left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwaiton Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwaiton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwaiton Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwaiton Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwaiton Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jwaiton left a comment •

edited

Loading

jwaiton Feb 15, 2026 •

edited

Loading

jwaiton Feb 10, 2026 •

edited

Loading

jwaiton Feb 15, 2026 •

edited

Loading

jwaiton Feb 15, 2026 •

edited

Loading