Skip to content

refactor: remove testify/suite from conformance tests for full parallelism#91

Merged
weeco merged 3 commits intomainfrom
ms/remove-testify-suite
Mar 31, 2026
Merged

refactor: remove testify/suite from conformance tests for full parallelism#91
weeco merged 3 commits intomainfrom
ms/remove-testify-suite

Conversation

@weeco
Copy link
Copy Markdown
Contributor

@weeco weeco commented Mar 27, 2026

Summary

Replace testify/suite with a custom internal/testsuite runner that discovers Test*(t *testing.T) methods via reflection and runs them as parallel subtests. testify/suite fundamentally doesn't support t.Parallel() (open since 2015).

Changes

  • Add internal/testsuite package — reflection-based test runner with parallel execution
  • Refactor Fixture interface: StandardModel()/ReasoningModel()NewStandardModel(t)/NewReasoningModel(t) factory methods
  • Convert all Suite methods to delegate to unexported test functions
  • Update all 8 provider/agent conformance test files to use testsuite.Run()
  • Wrap all fixture models with retry.WrapModel() consistently (was missing on Bedrock, OpenAICompat, DeepSeek)
  • Fix cache tests (Google, OpenAI): remove assertions on opportunistic caching behavior, add deterministic unit test for response mapper instead
  • Increase Anthropic provider timeout to 5 min for reasoning models

Results

  • OpenAI: 350s → 132s
  • Anthropic: 186s → 88s
  • Google: 686s → 66s
  • Zero test failures across full integration suite

Test plan

  • go build ./... compiles
  • task lint clean (no new issues)
  • go test -short ./... unit tests pass
  • Full integration test run: 22/22 packages pass

…elism

Replace testify/suite with a custom internal/testsuite runner that
discovers Test*(t *testing.T) methods via reflection and runs them as
parallel subtests. testify/suite fundamentally doesn't support
t.Parallel() (open issue since 2015), and we use none of its advantages.

Changes:
- Add internal/testsuite package with reflection-based test runner
- Refactor Suite structs: methods now take *testing.T parameter and
  delegate to unexported implementation functions
- Change Fixture interface: StandardModel()/ReasoningModel() become
  NewStandardModel(t)/NewReasoningModel(t) factory methods that can
  call t.Skip() directly
- Update all 8 provider/agent test files to use testsuite.Run()
- Add t.Parallel() to all nested subtests
- Wrap all fixture models with retry.WrapModel() consistently
  (was missing on Bedrock, OpenAICompat, DeepSeek)
- Fix Google cache test: use soft assertion (assert vs require) since
  implicit caching is opportunistic, increase prompt to 3000 tokens
- Increase Anthropic provider timeout to 5 minutes for reasoning models
@weeco weeco force-pushed the ms/remove-testify-suite branch from d56ca93 to 6147d0f Compare March 27, 2026 14:32
@weeco weeco requested review from birdayz and simon0191 March 27, 2026 15:21
@weeco
Copy link
Copy Markdown
Contributor Author

weeco commented Mar 27, 2026

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @weeco's task in 3m 23s —— View job


Code Review

  • Read key changed files
  • Analyze internal/testsuite package
  • Review conformance suite changes
  • Review fixture interface changes
  • Review provider-specific test files
  • Check cache test changes
  • Post review findings

This is a well-executed refactor. The speed improvements are impressive (Google 90%, OpenAI 62%, Anthropic 53%), the design is minimal, and the removal of testify/suite eliminates a long-standing blocker for proper parallelism. Below are my observations.

internal/testsuite/testsuite.go

Go 1.23 iterator dependency: for method := range rt.Methods() uses reflect.Type.Methods() as an iterator (added in Go 1.23). This is fine but worth making explicit in the module's go.mod minimum version if not already there.

No NumOut() check: discoverMethods accepts any method matching Test*(t *testing.T) regardless of return type. A method like TestFoo(t *testing.T) error would be discovered and its return value silently ignored. Low risk in practice since Go test conventions don't return values, but a stricter guard would be safer:

if method.Type.NumIn() != 2 || method.Type.NumIn() != 0 || method.Type.In(1) != testingTType {

Fix this →

Test coverage: TestRunExecutesAllTests verifies the count but not that each individual method ran exactly once. The atomic.Int32 proves correctness here — no issue, just noting it.

providers/conformance/suite.go

Inner subtests not parallel: The table-driven cases inside testGenerate, testGenerateEvents, etc. don't call t.Parallel(). This is likely intentional (avoiding a burst of API calls per top-level test) but creates an asymmetry — the top-level Test* methods run in parallel across each other, while their internal cases run serially. A brief comment on that decision would clarify intent.

testGenerateWithReasoning retry loop (pre-existing, not introduced here): The manual retry loop at lines 437–478 could use retry.WrapModel consistently instead of hand-rolling retries inside a conformance test. But this is pre-existing and out of scope.

providers/google/google_conformance_test.go

Unused ctx field on GoogleFixture: The ctx context.Context field (line 34) is stored on the fixture struct but never used in any test method — the test functions all use t.Context(). It appears to be leftover from before the refactor. Fix this →

Double model creation in NewStandardModel: When the model has reasoning capabilities, the method calls provider.NewModel twice — once to probe Capabilities(), then again with WithThinking. This is a minor inefficiency (two allocations) but functionally correct.

providers/google/google_conformance_test.goTestGoogleConformance_Integration

The test now iterates over multiple model names including ModelGemini31ProPreview. The variable is modelsToTest — worth double-checking that ModelGemini31ProPreview is the correct constant name (the PR description elsewhere mentions ModelGemini3ProPreview). If those are different constants, one of them may be unintended.

providers/openaicompat/deepseek_conformance_test.go

NewReasoningModel never skips: Unlike all other fixtures where an unavailable reasoning model calls t.Skipf, DeepSeekFixture.NewReasoningModel always returns a model and would fail (not skip) if the API rejects it. This is intentional (DeepSeek reasoning model is always expected to be available), but it's asymmetric with the other fixtures. A comment would help.

Cache tests — google/cache_test.go, openai/cache_test.go

The move away from asserting cache hits (opportunistic, non-deterministic) toward asserting CachedTokens >= 0 plus a deterministic TestResponseMapper_CachedTokens unit test is the right call. The new TestResponseMapper_CachedTokens in google_test.go is clean and well-structured.

Minor nit

sort.Slice in testsuite.go is fine. slices.SortFunc (Go 1.21+) would be more idiomatic given the module likely already requires that version, but this is a trivial style point.


Overall: LGTM with the unused ctx field being the only thing worth fixing before merge. The NumOut() guard is a nice-to-have. Everything else is minor.

// A suite must be a pointer to a struct whose test methods have the signature:
//
// func (s *Suite) TestXxx(t *testing.T)
func Run(t *testing.T, suite any) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! there's nothing standard out there that already does this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testify suite does exactly this but it doesnt have support for parallel test runs, thats why we rebuilt it basically

@weeco weeco requested a review from simon0191 March 30, 2026 12:54
@blacksmith-sh

This comment has been minimized.

The validate functions were discarding the error with `_ error`, causing
misleading "Expected value not to be nil" failures when the actual issue
was an API timeout or other error.
@weeco weeco merged commit 4e1497f into main Mar 31, 2026
4 checks passed
@weeco weeco deleted the ms/remove-testify-suite branch March 31, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants