Create processing_utils_lecroy and add first parser#45
Create processing_utils_lecroy and add first parser#45MattZur wants to merge 3 commits intonu-ZOO:mainfrom
Conversation
jwaiton
left a comment
There was a problem hiding this comment.
Good looking PR! Just needs a bit more documentation and some alterations to the readability in due time.
| """ | ||
|
|
||
| def parse_lecroy_segmented(lines): | ||
| # Line 1 has to have: Segments,1000,SegmentSize,5002 |
There was a problem hiding this comment.
Add documentation explaining what the function does, the input parameters and the expected output.
An example can be seen here
| This file holds all the relevant functions for the processing of data from csv files to h5. | ||
| """ | ||
|
|
||
| def parse_lecroy_segmented(lines): |
There was a problem hiding this comment.
Best practices (that I want to implement somewhat retroactively) is to include type-checking for all functions.
In your case that would look like:
def parse_lecroy_segmented(lines : str) --> Tuple(pd.DataFrame, pd.DataFrame):
If lines is a string, otherwise use the correct type
You'll have to import Typing for tuples:
from typing import Tuple
| segments = int(lines[1][1]) | ||
| seg_size = int(lines[1][3]) |
There was a problem hiding this comment.
Perhaps:
| segments = int(lines[1][1]) | |
| seg_size = int(lines[1][3]) | |
| segments, seg_size = int(lines[1][1]), int(lines[1][3]) |
but this is picky, and perhaps a bit more unreadable. The choice is yours 🐱
| # Find the "Time,Ampl" line | ||
| for i, line in enumerate(lines): | ||
| if line[0].strip() == "Time": | ||
| data_start = i + 1 | ||
| break |
There was a problem hiding this comment.
Is the "Time,Ampl" line inconsistent within the data you use? Like, does it sometimes occur 5 lines in, and other times 10 lines?
| for k in range(seg_size): | ||
| x = j * seg_size + k | ||
| if x >= len(raw_data): # x = line in the file | ||
| segment_data.append(None) |
There was a problem hiding this comment.
This means that a segment will be lost if it is a bit shorter than expected right? Can you quantify how many events you lose this way? Either through a test or otherwise.
| value_list.append(segment_data) | ||
|
|
||
| value_df = pd.DataFrame(value_list) | ||
| return value_df, header_df |
There was a problem hiding this comment.
A sensible test for this function would be to take a small input file (you can save it within the repository) and ensure the output of said file is as expected. On the second revision I'll think of a nicer way to format the main 'work loop' here.
|
On top of this, I'd perhaps specify more explicitly in the PR or the function definition what it does. It's harder to gauge if the code is correct without having a good understanding of the input/output 😸 |
add processing_utils_lecroy.py to the packs to be used within proc.py and add the first parser needed for the processing