Skip to content

fix(commit0): evaluation harness fails to score cachetools and parsel due to test command issues#539

Open
VascoSch92 wants to merge 4 commits intomainfrom
vasco/fix-issue-526
Open

fix(commit0): evaluation harness fails to score cachetools and parsel due to test command issues#539
VascoSch92 wants to merge 4 commits intomainfrom
vasco/fix-issue-526

Conversation

@VascoSch92
Copy link
Collaborator

@VascoSch92 VascoSch92 commented Mar 19, 2026

The PR solves issue #526

I tested here: https://openhands-ai.slack.com/archives/C09QGUDQVTL/p1773907384946099

Here are the results:

cachetools

  • Before (bug): 0/0 — ImportError: No module named 'cachetools' due to missing PYTHONPATH=src
  • Now: 215/215 passed (100%) — exit code 0, all tests pass, report.json generated correctly

parsel

  • Before (bug): 0/0 — exit code 127 ("command not found") because bare pytest wasn't on $PATH, no report.json generated
  • Now: 155/208 passed (74.5%) — exit code 1 (some test failures, but the harness correctly ran tests and captured results)

The parsel failures (51 failed, 155 passed) are genuine agent implementation gaps, not harness bugs, the test command executed successfully and produced a proper report.

@VascoSch92 VascoSch92 changed the title fix(commit0): fix(commit0): evaluation harness fails to score cachetools and parsel due to test command issues Mar 19, 2026
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solves real problems (ImportError and PATH issues) but has some fragile heuristics that could break edge cases.

@VascoSch92 VascoSch92 marked this pull request as ready for review March 19, 2026 08:36
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solves real production problems (cachetools, parsel) and tests are comprehensive. Two minor code clarity issues that don't block merging but would improve maintainability.

def normalize_pytest_cmd(test_cmd: str) -> str:
"""Replace bare pytest/pytest3 with python -m pytest to avoid PATH/permission issues."""
if re.match(r"pytest\d?$", test_cmd.strip()) and "python -m pytest" not in test_cmd:
test_cmd = re.sub(r"\bpytest(\d?)", r"python -m pytest\1", test_cmd, count=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Logic redundancy - lacks "good taste"

You check if the stripped command is exactly pytest or pytest3 with re.match(r"pytest\d?$", ...), but then use word boundaries \b in the substitution. If you've already verified it's exactly "pytest", word boundaries are redundant.

Simpler approach:

def normalize_pytest_cmd(test_cmd: str) -> str:
    """Replace bare pytest/pytest3 with python -m pytest to avoid PATH/permission issues."""
    stripped = test_cmd.strip()
    if stripped in ("pytest", "pytest3") and "python -m pytest" not in test_cmd:
        return test_cmd.replace(stripped, f"python -m {stripped}", 1)
    return test_cmd

Or if you need the regex flexibility:

if re.match(r"^pytest\d?$", test_cmd.strip()):
    return re.sub(r"^pytest(\d?)$", r"python -m pytest\1", test_cmd.strip())
return test_cmd

Current code works but mixes two different validation strategies unnecessarily.

Comment on lines +73 to +76
def get_pythonpath_prefix(src_dir: str) -> str:
"""Return PYTHONPATH env prefix for src-layout repos."""
if src_dir and src_dir.startswith("src"):
return "PYTHONPATH=src:$PYTHONPATH "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Misleading function signature

The function takes src_dir as a parameter but ignores its actual value and returns a hardcoded "PYTHONPATH=src:$PYTHONPATH ". This makes it look like the function would use different paths based on src_dir, but it only uses it as a boolean check.

Clearer alternatives:

  1. Inline it (simplest):
env_prefix = "PYTHONPATH=src:$PYTHONPATH " if src_dir and src_dir.startswith("src") else ""
  1. Rename to reflect actual behavior:
def is_src_layout(src_dir: str) -> bool:
    """Check if repo uses src-layout requiring PYTHONPATH."""
    return bool(src_dir and src_dir.startswith("src"))

# Then:
env_prefix = "PYTHONPATH=src:$PYTHONPATH " if is_src_layout(src_dir) else ""

The current name get_pythonpath_prefix suggests it derives a path from the parameter, but it doesn't.

VascoSch92 and others added 2 commits March 19, 2026 11:42
  The regex anchor $ prevented matching 'pytest --assert=plain ...' (parsel).
  Changed to (\s|$) to match pytest followed by flags or end-of-string.

  Fixes parsel exit code 127 in commit0 evaluation.
  Ref: #526
The src_dir field is a top-level field in the dataset:
  {"test": {"test_cmd": "pytest", "test_dir": "tests/"}, "src_dir": "src/cachetools/"}

The previous code read from instance.data["test"].get("src_dir", "")
which always returned "" because src_dir is not nested under "test".
This caused the PYTHONPATH=src prefix to never be applied for cachetools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants