fix(commit0): evaluation harness fails to score cachetools and parsel due to test command issues#539
fix(commit0): evaluation harness fails to score cachetools and parsel due to test command issues#539VascoSch92 wants to merge 4 commits intomainfrom
Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solves real problems (ImportError and PATH issues) but has some fragile heuristics that could break edge cases.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solves real production problems (cachetools, parsel) and tests are comprehensive. Two minor code clarity issues that don't block merging but would improve maintainability.
| def normalize_pytest_cmd(test_cmd: str) -> str: | ||
| """Replace bare pytest/pytest3 with python -m pytest to avoid PATH/permission issues.""" | ||
| if re.match(r"pytest\d?$", test_cmd.strip()) and "python -m pytest" not in test_cmd: | ||
| test_cmd = re.sub(r"\bpytest(\d?)", r"python -m pytest\1", test_cmd, count=1) |
There was a problem hiding this comment.
🟡 Suggestion: Logic redundancy - lacks "good taste"
You check if the stripped command is exactly pytest or pytest3 with re.match(r"pytest\d?$", ...), but then use word boundaries \b in the substitution. If you've already verified it's exactly "pytest", word boundaries are redundant.
Simpler approach:
def normalize_pytest_cmd(test_cmd: str) -> str:
"""Replace bare pytest/pytest3 with python -m pytest to avoid PATH/permission issues."""
stripped = test_cmd.strip()
if stripped in ("pytest", "pytest3") and "python -m pytest" not in test_cmd:
return test_cmd.replace(stripped, f"python -m {stripped}", 1)
return test_cmdOr if you need the regex flexibility:
if re.match(r"^pytest\d?$", test_cmd.strip()):
return re.sub(r"^pytest(\d?)$", r"python -m pytest\1", test_cmd.strip())
return test_cmdCurrent code works but mixes two different validation strategies unnecessarily.
| def get_pythonpath_prefix(src_dir: str) -> str: | ||
| """Return PYTHONPATH env prefix for src-layout repos.""" | ||
| if src_dir and src_dir.startswith("src"): | ||
| return "PYTHONPATH=src:$PYTHONPATH " |
There was a problem hiding this comment.
🟡 Suggestion: Misleading function signature
The function takes src_dir as a parameter but ignores its actual value and returns a hardcoded "PYTHONPATH=src:$PYTHONPATH ". This makes it look like the function would use different paths based on src_dir, but it only uses it as a boolean check.
Clearer alternatives:
- Inline it (simplest):
env_prefix = "PYTHONPATH=src:$PYTHONPATH " if src_dir and src_dir.startswith("src") else ""- Rename to reflect actual behavior:
def is_src_layout(src_dir: str) -> bool:
"""Check if repo uses src-layout requiring PYTHONPATH."""
return bool(src_dir and src_dir.startswith("src"))
# Then:
env_prefix = "PYTHONPATH=src:$PYTHONPATH " if is_src_layout(src_dir) else ""The current name get_pythonpath_prefix suggests it derives a path from the parameter, but it doesn't.
The regex anchor $ prevented matching 'pytest --assert=plain ...' (parsel). Changed to (\s|$) to match pytest followed by flags or end-of-string. Fixes parsel exit code 127 in commit0 evaluation. Ref: #526
The src_dir field is a top-level field in the dataset:
{"test": {"test_cmd": "pytest", "test_dir": "tests/"}, "src_dir": "src/cachetools/"}
The previous code read from instance.data["test"].get("src_dir", "")
which always returned "" because src_dir is not nested under "test".
This caused the PYTHONPATH=src prefix to never be applied for cachetools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The PR solves issue #526
I tested here: https://openhands-ai.slack.com/archives/C09QGUDQVTL/p1773907384946099
Here are the results:
cachetools
parsel
The parsel failures (51 failed, 155 passed) are genuine agent implementation gaps, not harness bugs, the test command executed successfully and produced a proper report.