refactor: remove testify/suite from conformance tests for full parallelism by weeco · Pull Request #91 · redpanda-data/ai-sdk-go

weeco · 2026-03-27T14:29:42Z

Summary

Replace testify/suite with a custom internal/testsuite runner that discovers Test*(t *testing.T) methods via reflection and runs them as parallel subtests. testify/suite fundamentally doesn't support t.Parallel() (open since 2015).

Changes

Add internal/testsuite package — reflection-based test runner with parallel execution
Refactor Fixture interface: StandardModel()/ReasoningModel() → NewStandardModel(t)/NewReasoningModel(t) factory methods
Convert all Suite methods to delegate to unexported test functions
Update all 8 provider/agent conformance test files to use testsuite.Run()
Wrap all fixture models with retry.WrapModel() consistently (was missing on Bedrock, OpenAICompat, DeepSeek)
Fix cache tests (Google, OpenAI): remove assertions on opportunistic caching behavior, add deterministic unit test for response mapper instead
Increase Anthropic provider timeout to 5 min for reasoning models

Results

OpenAI: 350s → 132s
Anthropic: 186s → 88s
Google: 686s → 66s
Zero test failures across full integration suite

Test plan

go build ./... compiles
task lint clean (no new issues)
go test -short ./... unit tests pass
Full integration test run: 22/22 packages pass

…elism Replace testify/suite with a custom internal/testsuite runner that discovers Test*(t *testing.T) methods via reflection and runs them as parallel subtests. testify/suite fundamentally doesn't support t.Parallel() (open issue since 2015), and we use none of its advantages. Changes: - Add internal/testsuite package with reflection-based test runner - Refactor Suite structs: methods now take *testing.T parameter and delegate to unexported implementation functions - Change Fixture interface: StandardModel()/ReasoningModel() become NewStandardModel(t)/NewReasoningModel(t) factory methods that can call t.Skip() directly - Update all 8 provider/agent test files to use testsuite.Run() - Add t.Parallel() to all nested subtests - Wrap all fixture models with retry.WrapModel() consistently (was missing on Bedrock, OpenAICompat, DeepSeek) - Fix Google cache test: use soft assertion (assert vs require) since implicit caching is opportunistic, increase prompt to 3000 tokens - Increase Anthropic provider timeout to 5 minutes for reasoning models

weeco · 2026-03-27T15:21:37Z

@claude review

claude · 2026-03-27T15:21:56Z

Claude finished @weeco's task in 3m 23s —— View job

Code Review

Read key changed files
Analyze internal/testsuite package
Review conformance suite changes
Review fixture interface changes
Review provider-specific test files
Check cache test changes
Post review findings

This is a well-executed refactor. The speed improvements are impressive (Google 90%, OpenAI 62%, Anthropic 53%), the design is minimal, and the removal of testify/suite eliminates a long-standing blocker for proper parallelism. Below are my observations.

`internal/testsuite/testsuite.go`

Go 1.23 iterator dependency: for method := range rt.Methods() uses reflect.Type.Methods() as an iterator (added in Go 1.23). This is fine but worth making explicit in the module's go.mod minimum version if not already there.

No NumOut() check: discoverMethods accepts any method matching Test*(t *testing.T) regardless of return type. A method like TestFoo(t *testing.T) error would be discovered and its return value silently ignored. Low risk in practice since Go test conventions don't return values, but a stricter guard would be safer:

if method.Type.NumIn() != 2 || method.Type.NumIn() != 0 || method.Type.In(1) != testingTType {

Fix this →

Test coverage: TestRunExecutesAllTests verifies the count but not that each individual method ran exactly once. The atomic.Int32 proves correctness here — no issue, just noting it.

`providers/conformance/suite.go`

Inner subtests not parallel: The table-driven cases inside testGenerate, testGenerateEvents, etc. don't call t.Parallel(). This is likely intentional (avoiding a burst of API calls per top-level test) but creates an asymmetry — the top-level Test* methods run in parallel across each other, while their internal cases run serially. A brief comment on that decision would clarify intent.

testGenerateWithReasoning retry loop (pre-existing, not introduced here): The manual retry loop at lines 437–478 could use retry.WrapModel consistently instead of hand-rolling retries inside a conformance test. But this is pre-existing and out of scope.

`providers/google/google_conformance_test.go`

Unused ctx field on GoogleFixture: The ctx context.Context field (line 34) is stored on the fixture struct but never used in any test method — the test functions all use t.Context(). It appears to be leftover from before the refactor. Fix this →

Double model creation in NewStandardModel: When the model has reasoning capabilities, the method calls provider.NewModel twice — once to probe Capabilities(), then again with WithThinking. This is a minor inefficiency (two allocations) but functionally correct.

`providers/google/google_conformance_test.go` — `TestGoogleConformance_Integration`

The test now iterates over multiple model names including ModelGemini31ProPreview. The variable is modelsToTest — worth double-checking that ModelGemini31ProPreview is the correct constant name (the PR description elsewhere mentions ModelGemini3ProPreview). If those are different constants, one of them may be unintended.

`providers/openaicompat/deepseek_conformance_test.go`

NewReasoningModel never skips: Unlike all other fixtures where an unavailable reasoning model calls t.Skipf, DeepSeekFixture.NewReasoningModel always returns a model and would fail (not skip) if the API rejects it. This is intentional (DeepSeek reasoning model is always expected to be available), but it's asymmetric with the other fixtures. A comment would help.

Cache tests — `google/cache_test.go`, `openai/cache_test.go`

The move away from asserting cache hits (opportunistic, non-deterministic) toward asserting CachedTokens >= 0 plus a deterministic TestResponseMapper_CachedTokens unit test is the right call. The new TestResponseMapper_CachedTokens in google_test.go is clean and well-structured.

Minor nit

sort.Slice in testsuite.go is fine. slices.SortFunc (Go 1.21+) would be more idiomatic given the module likely already requires that version, but this is a trivial style point.

Overall: LGTM with the unused ctx field being the only thing worth fixing before merge. The NumOut() guard is a nice-to-have. Everything else is minor.

simon0191 · 2026-03-27T15:25:52Z

internal/testsuite/testsuite.go

+// A suite must be a pointer to a struct whose test methods have the signature:
+//
+//	func (s *Suite) TestXxx(t *testing.T)
+func Run(t *testing.T, suite any) {


Interesting! there's nothing standard out there that already does this?

testify suite does exactly this but it doesnt have support for parallel test runs, thats why we rebuilt it basically

…, use idiomatic slices

The validate functions were discarding the error with `_ error`, causing misleading "Expected value not to be nil" failures when the actual issue was an API timeout or other error.

weeco force-pushed the ms/remove-testify-suite branch from d56ca93 to 6147d0f Compare March 27, 2026 14:32

weeco requested review from birdayz and simon0191 March 27, 2026 15:21

simon0191 reviewed Mar 27, 2026

View reviewed changes

fix: address PR review feedback — add NumOut guard, remove unused ctx…

03eada1

…, use idiomatic slices

weeco requested a review from simon0191 March 30, 2026 12:54

This comment has been minimized.

Sign in to view

fix: check error before response in conformance test validators

32c97b1

The validate functions were discarding the error with `_ error`, causing misleading "Expected value not to be nil" failures when the actual issue was an API timeout or other error.

birdayz approved these changes Mar 31, 2026

View reviewed changes

weeco merged commit 4e1497f into main Mar 31, 2026
4 checks passed

weeco deleted the ms/remove-testify-suite branch March 31, 2026 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: remove testify/suite from conformance tests for full parallelism#91

refactor: remove testify/suite from conformance tests for full parallelism#91
weeco merged 3 commits intomainfrom
ms/remove-testify-suite

weeco commented Mar 27, 2026

Uh oh!

weeco commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

simon0191 Mar 27, 2026

Uh oh!

weeco Mar 27, 2026

Uh oh!

This comment has been minimized.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

weeco commented Mar 27, 2026

Summary

Changes

Results

Test plan

Uh oh!

weeco commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

internal/testsuite/testsuite.go

providers/conformance/suite.go

providers/google/google_conformance_test.go

providers/google/google_conformance_test.go — TestGoogleConformance_Integration

providers/openaicompat/deepseek_conformance_test.go

Cache tests — google/cache_test.go, openai/cache_test.go

Minor nit

Uh oh!

simon0191 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

weeco Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Mar 27, 2026 •

edited

Loading

`internal/testsuite/testsuite.go`

`providers/conformance/suite.go`

`providers/google/google_conformance_test.go`

`providers/google/google_conformance_test.go` — `TestGoogleConformance_Integration`

`providers/openaicompat/deepseek_conformance_test.go`

Cache tests — `google/cache_test.go`, `openai/cache_test.go`