Create processing_utils_lecroy and add first parser#45

Open

MattZur wants to merge 3 commits intonu-ZOO:mainfrom

MattZur:Add-Processing-Lecroy-scope

Collaborator

MattZur commented Oct 10, 2025

add processing_utils_lecroy.py to the packs to be used within proc.py and add the first parser needed for the processing

MattZur and others added 2 commits

October 10, 2025 16:26


          Create processing_utils_lecroy

ca8d43f


          add parser for typical lecroy .csv files for sipm signal

c248de6

MattZur requested a review from jwaiton

October 10, 2025 15:33


          add notes and imports

f134d78

jwaiton requested changes

View reviewed changes

Member

jwaiton left a comment

Good looking PR! Just needs a bit more documentation and some alterations to the readability in due time.

packs/proc/processing_utils_lecroy.py

+              """
+              def parse_lecroy_segmented(lines):
+                  # Line 1 has to have: Segments,1000,SegmentSize,5002

Member

jwaiton Oct 14, 2025

Add documentation explaining what the function does, the input parameters and the expected output.

An example can be seen here

packs/proc/processing_utils_lecroy.py

+              This file holds all the relevant functions for the processing of data from csv files to h5.
+              """
+              def parse_lecroy_segmented(lines):

Member

jwaiton Oct 14, 2025

Best practices (that I want to implement somewhat retroactively) is to include type-checking for all functions.

In your case that would look like:

def parse_lecroy_segmented(lines  :  str) --> Tuple(pd.DataFrame, pd.DataFrame):

If lines is a string, otherwise use the correct type

You'll have to import Typing for tuples:
from typing import Tuple

packs/proc/processing_utils_lecroy.py

Comment on lines +16 to +17

		segments = int(lines[1][1])
		seg_size = int(lines[1][3])

Member

jwaiton Oct 14, 2025

Perhaps:

Suggested change

      
                segments = int(lines[1][1])
          
                seg_size = int(lines[1][3])
          
                segments, seg_size = int(lines[1][1]), int(lines[1][3])

but this is picky, and perhaps a bit more unreadable. The choice is yours 🐱

packs/proc/processing_utils_lecroy.py

Comment on lines +27 to +31

+                  # Find the "Time,Ampl" line
+                  for i, line in enumerate(lines):
+                      if line[0].strip() == "Time":
+                          data_start = i + 1
+                          break

Member

jwaiton Oct 14, 2025

Is the "Time,Ampl" line inconsistent within the data you use? Like, does it sometimes occur 5 lines in, and other times 10 lines?

packs/proc/processing_utils_lecroy.py

+                      for k in range(seg_size):
+                          x = j * seg_size + k
+                          if x >= len(raw_data): # x = line in the file
+                              segment_data.append(None)

Member

jwaiton Oct 14, 2025

This means that a segment will be lost if it is a bit shorter than expected right? Can you quantify how many events you lose this way? Either through a test or otherwise.

packs/proc/processing_utils_lecroy.py

+                      value_list.append(segment_data)
+                  value_df = pd.DataFrame(value_list)
+                  return value_df, header_df

Member

jwaiton Oct 14, 2025

A sensible test for this function would be to take a small input file (you can save it within the repository) and ensure the output of said file is as expected. On the second revision I'll think of a nicer way to format the main 'work loop' here.

Member

jwaiton commented Oct 14, 2025

On top of this, I'd perhaps specify more explicitly in the PR or the function definition what it does. It's harder to gauge if the code is correct without having a good understanding of the input/output 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet