You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, ./script/test is a light wrapper around individual container tests.
This is fine for development or debugging, but isn't actually running all
tests, as we now have cram-based test suites. This script should be an
entrypoint to running all available tests, not direct execution of what is
essentially test plumbing now, not the actual test suite. We can provide an
argument to call that directly, or the user can manually run ./test/container/run-test.sh to do so instead.
Design note
The redesign should not only account for the current cram-based test suites.
It should be designed to accommodate multiple testing backends and mechanisms,
including:
cram-based shell test suites
the potential agent skill evals being added in Integrate agent skill evals into testing #107, even if those are
ultimately wrapped by cram or pytest rather than being their own standalone
runner shape
the existing pytest-based tests under scripts/tests/ for Python script unit
coverage
Issue #108 does not depend on #107 landing first, but the ./script/test
design should leave room for that class of test backend rather than being
shaped only around the current container and cram flows.
Currently,
./script/testis a light wrapper around individual container tests.This is fine for development or debugging, but isn't actually running all
tests, as we now have cram-based test suites. This script should be an
entrypoint to running all available tests, not direct execution of what is
essentially test plumbing now, not the actual test suite. We can provide an
argument to call that directly, or the user can manually run
./test/container/run-test.shto do so instead.Design note
The redesign should not only account for the current cram-based test suites.
It should be designed to accommodate multiple testing backends and mechanisms,
including:
ultimately wrapped by cram or pytest rather than being their own standalone
runner shape
scripts/tests/for Python script unitcoverage
Issue #108 does not depend on #107 landing first, but the
./script/testdesign should leave room for that class of test backend rather than being
shaped only around the current container and cram flows.