Skip to content

Conversation

@reinecke
Copy link
Member

@reinecke reinecke commented Aug 2, 2023

The header parser used a blank line as its exit condition for the end of the section. This meant that if there was no blank line between the end of the Header section and the Column section, these ALEs would fail with the error Invalid Heading line: Column.

The new parser stops header processing when it detects a new section.

This PR also functionally decomposes the main processing loop a bit more in an attempt to make it more readable.

@reinecke reinecke force-pushed the fix/ales-without-blank-lines-beteen-sections branch from 71aa618 to b93bad4 Compare August 2, 2023 22:24
raise ALEParseError("Invalid Heading line: " + line)

segments = line.split("\t")
while len(segments) >= 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Segments beyond the 2nd are lost here. Should that be an error condition, or should those be part of val?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the check on line 206 handles that behavior, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only gets to line 207 and raises an ALEParseError if there's an odd number of values in the row. Otherwise it handles even pairs fine and they all get added to the header dictionary, no key value pairs lost.

WACK_HEADER = "FIELD_DELIM\tTABS\n\nVIDEO_FORMAT\t108\tBANANA\tAPPLE\tGRAPE\tJUICE\n\nAUDIO_FORMAT\t48kh\tCESSNA\tDIAMOND\n\nFPS\t25\tWRONGFPS\t23"
lines = WACK_HEADER.splitlines()
header = _read_heading_lines(lines)
header
>>> {'FIELD_DELIM': 'TABS', 'VIDEO_FORMAT': '108', 'BANANA': 'APPLE', 'GRAPE': 'JUICE', 'AUDIO_FORMAT': '48kh', 'CESSNA': 'DIAMOND', 'FPS': '25', 'WRONGFPS': '23'}

@reinecke reinecke force-pushed the fix/ales-without-blank-lines-beteen-sections branch from b93bad4 to 1b1ebd0 Compare November 21, 2025 03:15
reinecke and others added 4 commits November 21, 2025 16:18
… explicit error if FPS is missing in header rather than letting an implicit NameError come up later

Signed-off-by: Eric Reinecke <[email protected]>
Signed-off-by: Eric Reinecke <[email protected]>
@reinecke reinecke force-pushed the fix/ales-without-blank-lines-beteen-sections branch from 3cbcc2b to bee0789 Compare November 22, 2025 00:19
Signed-off-by: Eric Reinecke <[email protected]>
Copy link

@shidarin shidarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, would love a comment about skipping blank lines in the column func, because I didn't read ahead and didn't grok to what you were trying to do and thought I really forgot how to code- how would while not "text" enter the while?!?!

# read until we run out of lines or the next section starts, not consuming
# the next section designator
while len(lines) and lines[0] not in ALE_SECTION_DESIGNATORS:
line = lines.pop(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of mutating input arguments, but it's a private func, we're fine.

header = {}
# read until we run out of lines or the next section starts, not consuming
# the next section designator
while len(lines) and lines[0] not in ALE_SECTION_DESIGNATORS:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines[0] not in ALE_SECTION_DESIGNATORS

Good, this means we skip over as many blank lines at the end as we need.

raise ALEParseError("Invalid Heading line: " + line)

segments = line.split("\t")
while len(segments) >= 2:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only gets to line 207 and raises an ALEParseError if there's an odd number of values in the row. Otherwise it handles even pairs fine and they all get added to the header dictionary, no key value pairs lost.

WACK_HEADER = "FIELD_DELIM\tTABS\n\nVIDEO_FORMAT\t108\tBANANA\tAPPLE\tGRAPE\tJUICE\n\nAUDIO_FORMAT\t48kh\tCESSNA\tDIAMOND\n\nFPS\t25\tWRONGFPS\t23"
lines = WACK_HEADER.splitlines()
header = _read_heading_lines(lines)
header
>>> {'FIELD_DELIM': 'TABS', 'VIDEO_FORMAT': '108', 'BANANA': 'APPLE', 'GRAPE': 'JUICE', 'AUDIO_FORMAT': '48kh', 'CESSNA': 'DIAMOND', 'FPS': '25', 'WRONGFPS': '23'}

"""
try:
line = lines.pop(0)
while not line.strip():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a "skip blank lines" comment here for readability.

clip_generator = _read_data(
lines, columns, fps, ale_name_column_key
)
collection.extend(clip_generator)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually need to create the Collection until here, since we're no longer appending to it. Do we even need to extend it or can we just generate it in one line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants