Skip to content

Conversation

@pjcdawkins
Copy link

@pjcdawkins pjcdawkins commented Nov 14, 2025

Summary

Fixes #828: Workers and streaming goroutines do not exit and leak resources on context cancellation.

This PR addresses two critical goroutine leak issues:

  1. Worker loops that ignore context cancellation
  2. Streaming handlers that block indefinitely on I/O

Changes

Worker Loop Fix

Refactored requestWorker in core/bifrost.go to monitor bifrost.ctx.Done() using a select statement. Workers now exit immediately when the context is cancelled, instead of only exiting when the queue channel closes.

Before:

for req := range queue {
    // Process request
}

After:

for {
    select {
    case req, ok := <-queue:
        if !ok {
            return
        }
        bifrost.processRequest(provider, config, req)
    case <-bifrost.ctx.Done():
        return
    }
}

Streaming Handler Fixes

Added context monitoring goroutines to all streaming handlers that force-close the response body stream when context is cancelled. This unblocks scanner.Scan() calls that were preventing goroutines from exiting.

Pattern applied to all streaming methods:

// Monitor context cancellation and force-close response body to unblock scanner.
done := make(chan struct{})
defer close(done)
go func() {
    select {
    case <-ctx.Done():
        resp.CloseBodyStream()
    case <-done:
    }
}()

Fixed in 11 streaming methods across 4 providers:

  • OpenAI: 5 methods (TextCompletionStream, ChatCompletionStream, ResponsesStream, SpeechStream, TranscriptionStream)
  • Anthropic: 2 methods (ChatCompletionStream, ResponsesStream)
  • Cohere: 2 methods (ChatCompletionStream, ResponsesStream)
  • Gemini: 2 methods (SpeechStream, TranscriptionStream)

Testing

Added TestBifrostContextCancellationCleanup to verify goroutines clean up properly after context timeout.

Test Results:

  • Before fixes: 5 goroutines leaked
  • After fixes: 3 goroutines leaked (40% improvement)

Type of change

  • Bug fix

Affected areas

  • Core (Go)

How to test

# Run the goroutine leak test
go test -v -run TestBifrostContextCancellationCleanup ./core

# Run all tests
go test ./...

Breaking changes

  • Yes
  • No

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I verified builds succeed
  • I verified the CI pipeline passes locally if applicable

github-actions bot and others added 30 commits October 29, 2025 21:54
## Summary

Improve the Mintlify changelog generation script to clear changelog files after processing and handle empty changelog content gracefully.

## Changes

- Modified the `append_section` function to check if the changelog content is empty before adding it to the output
- Added functionality to clear changelog files after they've been processed, preventing duplicate entries in future runs
- Improved error handling by removing the early return when content retrieval fails

## Type of change

- [x] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Documentation
- [x] Chore/CI

## Affected areas

- [ ] Core (Go)
- [ ] Transports (HTTP)
- [ ] Providers/Integrations
- [ ] Plugins
- [ ] UI (Next.js)
- [x] Docs

## How to test

Run the changelog generation script and verify:
1. Empty changelog files are handled properly
2. Changelog files are cleared after processing
3. The generated output contains only non-empty sections

```sh
# Run the changelog generation script
.github/workflows/scripts/push-mintlify-changelog.sh
```

## Breaking changes

- [ ] Yes
- [x] No

## Security considerations

No security implications as this only affects the documentation build process.

## Checklist

- [x] I read `docs/contributing/README.md` and followed the guidelines
- [x] I verified the CI pipeline passes locally if applicable
…_gemini_responses_stream

fix: handle panic in gemini responses stream
## Summary

Enhance the ListModels functionality to aggregate results from multiple API keys, improving model discovery and availability.

## Changes

- Modified `ListModels` interface to accept multiple keys instead of a single key
- Implemented key aggregation logic that combines model lists from multiple API keys
- Added deduplication of models based on model ID when aggregating results
- Created a best-effort approach that continues processing remaining keys even if some fail
- Added a new helper function `getAllSupportedKeys` to retrieve all valid keys for a provider
- Updated all provider implementations to support the multi-key approach

## Type of change

- [ ] Bug fix
- [x] Feature
- [ ] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [x] Core (Go)
- [ ] Transports (HTTP)
- [x] Providers/Integrations
- [ ] Plugins
- [ ] UI (Next.js)
- [ ] Docs

## How to test

Test the ListModels endpoint with multiple API keys configured for the same provider:

```sh
# Core/Transports
go version
go test ./...

# Test with multiple keys for a provider (e.g., OpenAI)
curl -X GET "http://localhost:8000/v1/models?provider=openai" -H "Authorization: Bearer your-bifrost-token"
```

## Breaking changes

- [ ] Yes
- [x] No

## Related issues

Improves model discovery by aggregating results from multiple API keys.

## Security considerations

The implementation maintains the same security model as before, with proper error handling to avoid leaking sensitive information.

## Checklist

- [x] I added/updated tests where appropriate
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
…ized request handling (maximhq#724)

## Summary

Refactored the Cohere provider to use a unified request handling approach and improved response object pooling for better performance and memory efficiency.

## Changes

- Implemented `completeRequest` method in Cohere provider to centralize API request handling
- Added object pooling for Cohere embedding responses to reduce memory allocations
- Standardized response handling across all Cohere API endpoints
- Added id, object, and model fields to Chat Completion responses from Bedrock and Cohere providers
- Updated list models functionality to use all available keys

## Type of change

- [ ] Bug fix
- [x] Feature
- [x] Refactor
- [ ] Documentation
- [x] Chore/CI

## Affected areas

- [x] Core (Go)
- [x] Transports (HTTP)
- [x] Providers/Integrations
- [ ] Plugins
- [ ] UI (Next.js)
- [ ] Docs

## How to test

Test the Cohere provider with various request types to ensure proper functionality:

```sh
# Core/Transports
go version
go test ./...

# Test Cohere provider specifically
go test ./core/providers -run TestCohereProvider
```

## Breaking changes

- [x] No
- [ ] Yes

## Related issues

Improves performance and standardizes response handling across providers.

## Security considerations

No security implications as this is an internal refactoring of existing functionality.

## Checklist

- [x] I added/updated tests where appropriate
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
…pi (maximhq#702)

## Summary

Implement proper type handling for Anthropic tools in the schema conversion layer, adding support for specialized tool types like computer use preview, web search, and local shell.

## Changes

- Added `ResponsesToolType` as a proper string type with constants for all supported tool types
- Enhanced Anthropic tool conversion to properly handle specialized tool types:
  - Computer use preview tools
  - Web search tools
  - Local shell (bash) tools
- Added proper type definitions for Anthropic-specific tool structures
- Fixed type conversion in the chat tool mux to use the new type system

## Type of change

- [x] Feature
- [x] Refactor

## Affected areas

- [x] Core (Go)
- [x] Providers/Integrations

## How to test

```sh
# Core/Transports
go version
go test ./...
```

Test with Anthropic API calls that use specialized tools like computer use preview, web search, and local shell to verify proper conversion between Bifrost and Anthropic formats.

## Breaking changes

- [x] No

## Related issues

Improves tool handling for Anthropic provider integration.

## Checklist

- [x] I added/updated tests where appropriate
- [x] I verified builds succeed (Go and UI)
…aximhq#725)

## Summary

Add support for storing and displaying Responses API input history in logs, enabling better tracking and debugging of Responses API requests.

## Changes

- Added a new `responses_input_history` column to the logs database table
- Created a database migration to add the new column
- Modified the logging plugin to extract and store Responses API input messages
- Updated the UI to display Responses input history in log details
- Renamed `LogResponsesOutputView` to `LogResponsesMessageView` for reuse with both input and output
- Fixed an issue in the post-hook runner by clearing `ChatResponse` after converting to `ResponsesStreamResponse`

## Type of change

- [ ] Bug fix
- [x] Feature
- [ ] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [x] Core (Go)
- [x] Transports (HTTP)
- [ ] Providers/Integrations
- [x] Plugins
- [x] UI (Next.js)
- [ ] Docs

## How to test

1. Make Responses API requests
2. Check logs to verify input history is properly stored and displayed
3. Verify that both input and output messages are properly displayed in the UI

```sh
# Core/Transports
go version
go test ./...

# UI
cd ui
pnpm i || npm i
pnpm test || npm test
pnpm build || npm run build
```

## Breaking changes

- [x] Yes
- [ ] No

The database schema has been updated with a new column. Users should run the application to trigger the migration.

## Security considerations

No new security implications. The implementation follows the same patterns as existing log storage.

## Checklist

- [x] I read `docs/contributing/README.md` and followed the guidelines
- [x] I added/updated tests where appropriate
- [x] I updated documentation where needed
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
## Summary

This PR updates the PreRequestCallback interface to include a Bifrost context parameter, allowing callbacks to modify the context before it's passed to Bifrost. It also adds a rawBody parameter to provide access to the original request body.

## Changes

- Modified the `PreRequestCallback` function signature to include `bifrostCtx *context.Context` and `rawBody []byte` parameters
- Updated all implementations of `PreRequestCallback` in various integration files (anthropic.go, genai.go, openai.go)
- Added a new router.go file that contains the centralized router implementation with the updated interface
- Moved most of the code from utils.go to router.go for better organization
- Simplified utils.go to contain only utility functions

## Type of change

- [ ] Bug fix
- [x] Feature
- [x] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [ ] Core (Go)
- [x] Transports (HTTP)
- [x] Providers/Integrations
- [ ] Plugins
- [ ] UI (Next.js)
- [ ] Docs

## How to test

Test that pre-request callbacks can now modify the Bifrost context:

```sh
# Core/Transports
go version
go test ./...

# Test with a custom integration that uses the PreRequestCallback
# to modify the Bifrost context before processing
```

## Breaking changes

- [x] Yes
- [ ] No

This changes the signature of the `PreRequestCallback` interface. Any custom implementations will need to be updated to match the new signature.

## Related issues

Enables better context manipulation in pre-request callbacks, which helps with request processing customization.

## Security considerations

No additional security implications. The change maintains the existing security model while providing more flexibility.

## Checklist

- [x] I read `docs/contributing/README.md` and followed the guidelines
- [x] I added/updated tests where appropriate
- [x] I updated documentation where needed
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
…raw request passthrough and response gzip decompression and anthropic integration passthrough (maximhq#731)

## Summary

Add support for passthrough mode to allow direct forwarding of requests to providers without modifying the request body. This enables advanced use cases like Claude Code and other provider-specific features that require exact request formats.

## Changes

- Added context keys to support passthrough mode: `BifrostContextKeySkipKeySelection`, `BifrostContextKeyExtraHeaders`, `BifrostContextKeyURLPath`, and `BifrostContextKeyRequestBody`
- Implemented passthrough detection for Anthropic API requests
- Refactored provider code to check for raw request bodies in context before converting
- Added support for preserving original URL paths and headers
- Improved error handling for unsupported operations with standardized messages
- Fixed handling of response bodies with different content encodings (gzip)
- Added support for skipping key selection when appropriate

## Type of change

- [ ] Bug fix
- [x] Feature
- [x] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [x] Core (Go)
- [x] Transports (HTTP)
- [x] Providers/Integrations
- [ ] Plugins
- [ ] UI (Next.js)
- [ ] Docs

## How to test

Test passthrough mode with Anthropic API:

```sh
# Test with Claude Code request
curl -X POST http://localhost:8080/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-opus-20240229",
    "messages": [{"role": "user", "content": "Write a simple Python function to calculate Fibonacci numbers"}],
    "max_tokens": 1000
  }'

# Test with standard API key
curl -X POST http://localhost:8080/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: your-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-opus-20240229",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'
```

## Breaking changes

- [ ] Yes
- [x] No

## Related issues

Enables support for Claude Code and other provider-specific features that require exact request formats.

## Security considerations

The implementation preserves security by:
- Filtering out hop-by-hop headers that shouldn't be forwarded
- Maintaining proper authentication mechanisms
- Only allowing passthrough for specific providers and endpoints

## Checklist

- [x] I read `docs/contributing/README.md` and followed the guidelines
- [x] I added/updated tests where appropriate
- [x] I updated documentation where needed
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
…requests (maximhq#732)

## Summary

Implement proper context cancellation for HTTP requests to prevent resource leaks and wasted provider quota when clients disconnect, especially during streaming operations.

## Changes

- Added cancellable context support in `ConvertToBifrostContext()` which now returns both a context and a cancel function
- Ensured all HTTP handlers properly defer the cancel function to clean up resources
- Modified streaming handlers to immediately cancel upstream provider requests when clients disconnect
- Added detailed comments explaining the context cancellation pattern and its importance
- Fixed potential resource leaks in streaming handlers by properly handling write errors

## Type of change

- [x] Bug fix
- [ ] Feature
- [x] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [x] Core (Go)
- [x] Transports (HTTP)
- [ ] Providers/Integrations
- [ ] Plugins
- [ ] UI (Next.js)
- [ ] Docs

## How to test

Verify that streaming requests are properly cancelled when clients disconnect:

```sh
# Start Bifrost server
go run cmd/bifrost/main.go

# In another terminal, start a streaming request and then cancel it (Ctrl+C)
curl -N http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Write a very long story"}], "stream": true}'

# Verify in server logs that the upstream request was cancelled
```

## Breaking changes

- [ ] Yes
- [x] No

## Related issues

Addresses resource leaks and wasted provider quota when clients disconnect during streaming requests.

## Security considerations

Improves resource management by ensuring provider requests are cancelled when no longer needed, preventing potential resource exhaustion.

## Checklist

- [x] I added/updated tests where appropriate
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
## Summary

- Adds dynamic plugin architecture
- Add auth for dashboard and inference calls

Closes maximhq#246
Closes maximhq#518

## Changes

- What was changed and why
- Any notable design decisions or trade-offs

## Type of change

- [x] Bug fix
- [x] Feature
- [x] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [x] Core (Go)
- [x] Transports (HTTP)
- [ ] Providers/Integrations
- [ ] Plugins
- [x] UI (Next.js)
- [x] Docs

## How to test

Describe the steps to validate this change. Include commands and expected outcomes.

```sh
# Core/Transports
go version
go test ./...

# UI
cd ui
pnpm i || npm i
pnpm test || npm test
pnpm build || npm run build
```

If adding new configs or environment variables, document them here.

## Screenshots/Recordings

If UI changes, add before/after screenshots or short clips.

## Breaking changes

- [ ] Yes
- [x] No

If yes, describe impact and migration instructions.

## Related issues

Link related issues and discussions. Example: Closes maximhq#123

## Security considerations

Note any security implications (auth, secrets, PII, sandboxing, etc.).

## Checklist

- [x] I read `docs/contributing/README.md` and followed the guidelines
- [x] I added/updated tests where appropriate
- [x] I updated documentation where needed
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
core/providers/gemini/gemini.go (1)

749-758: TranscriptionStream uses the same robust cancellation pattern

This block mirrors the SpeechStream watcher and provides the same cancellation and cleanup guarantees; no additional issues here.

core/providers/openai/openai.go (4)

798-812: Same cancellation watcher pattern applied to chat streams

Chat SSE streaming now uses the same ctx.Done()CloseBodyStream watcher and done channel to avoid stuck scanners; consistent with the text completion path.


1196-1210: Responses streaming cancellation handling matches other handlers

The responses stream handler uses the same watcher pattern, which should prevent leaks when callers cancel the context mid‑stream.


1593-1607: Speech streaming gains proper cancellation unblocking

For speech SSE, the new watcher ensures the underlying body stream is closed on ctx.Done(), so the scanner loop can exit promptly instead of hanging on I/O.


1878-1892: Transcription streaming cancellation is now aligned with other streams

Transcription SSE streaming now also force‑closes the body stream on context cancellation, bringing it in line with the other OpenAI streaming handlers and avoiding goroutine leaks.

🧹 Nitpick comments (3)
core/bifrost.go (1)

2058-2071: Align key-selection error send with context-aware response sending

In the key-selection failure path you still use a bare send:

req.Err <- schemas.BifrostError{ ... }
return

Whereas later you’ve switched to select-based, context-aware sends with a timeout for both responses and streams. While this error path is expected to be consumed by the same caller, using the same non-blocking pattern here would make behavior more consistent and avoid a potential hard block if the caller has already stopped listening.

You could mirror the later pattern:

-        if err != nil {
-            bifrost.logger.Warn("error selecting key for model %s: %v", model, err)
-            req.Err <- schemas.BifrostError{
-                IsBifrostError: false,
-                Error: &schemas.ErrorField{
-                    Message: err.Error(),
-                    Error:   err,
-                },
-            }
-            return
-        }
+        if err != nil {
+            bifrost.logger.Warn("error selecting key for model %s: %v", model, err)
+            bifrostErr := schemas.BifrostError{
+                IsBifrostError: false,
+                Error: &schemas.ErrorField{
+                    Message: err.Error(),
+                    Error:   err,
+                },
+            }
+            select {
+            case req.Err <- bifrostErr:
+            case <-req.Context.Done():
+                bifrost.logger.Debug("Client context cancelled while sending key-selection error")
+            case <-time.After(5 * time.Second):
+                bifrost.logger.Warn("Timeout while sending key-selection error, client may have disconnected")
+            }
+            return
+        }

Not strictly required for correctness, but it would make this path consistent with the new response/stream handling.

core/bifrost_test.go (1)

965-1048: Goroutine leak test is well‑targeted; consider flakiness and runtime cost

This test does a good job exercising a streaming request under context timeout and asserting goroutines return close to baseline. Two minor considerations:

  • Using GC + time.Sleep(100 * time.Millisecond) twice plus a 2s timeout and 2s post‑Shutdown sleep can make the test relatively slow; you might be able to reduce the sleeps/timeouts once you’re confident in the fix.
  • The ≤ 2 goroutine tolerance is quite strict; if CI ever shows sporadic failures due to unrelated background goroutines, slightly widening the tolerance (or aggregating over a few runs) would make it more robust while still catching real leaks.

Functionally this looks correct and gives valuable regression coverage for #828.

core/providers/openai/openai.go (1)

398-411: Context cancellation watcher correctly unblocks text‑completion stream

The added done/ctx.Done() watcher that calls resp.CloseBodyStream() gives the scanner a reliable unblock path on cancellation while ensuring the watcher goroutine exits when the stream routine returns. This directly addresses stuck streaming goroutines.

If you find yourself touching this pattern again, a small helper (e.g., withStreamCancellationWatcher(ctx, resp, func(...) { ... })) shared across providers could reduce duplication and keep behavior consistent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18a9284 and e4c277f.

📒 Files selected for processing (15)
  • core/bifrost.go (1 hunks)
  • core/bifrost_test.go (2 hunks)
  • core/providers/anthropic/anthropic.go (2 hunks)
  • core/providers/anthropic/chat.go (1 hunks)
  • core/providers/azure/types.go (1 hunks)
  • core/providers/bedrock/chat.go (2 hunks)
  • core/providers/cohere/chat.go (1 hunks)
  • core/providers/cohere/cohere.go (2 hunks)
  • core/providers/cohere/models.go (1 hunks)
  • core/providers/gemini/chat.go (1 hunks)
  • core/providers/gemini/gemini.go (2 hunks)
  • core/providers/gemini/models.go (2 hunks)
  • core/providers/openai/openai.go (5 hunks)
  • core/providers/openai/types.go (1 hunks)
  • core/providers/vertex/models.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
core/providers/gemini/chat.go (2)
core/providers/gemini/types.go (1)
  • Type (769-769)
core/schemas/chatcompletions.go (1)
  • ChatToolChoiceTypeFunction (250-250)
core/providers/cohere/models.go (1)
core/schemas/models.go (1)
  • Model (47-66)
core/bifrost.go (4)
core/schemas/provider.go (4)
  • Provider (207-234)
  • ProviderConfig (161-169)
  • CustomProviderConfig (142-148)
  • PostHookRunner (204-204)
core/schemas/bifrost.go (9)
  • BifrostRequest (140-150)
  • BifrostResponse (243-253)
  • BifrostStream (315-322)
  • BifrostError (350-359)
  • RequestType (81-81)
  • ErrorField (368-375)
  • BifrostContextKeySelectedKeyID (107-107)
  • BifrostContextKeySelectedKeyName (108-108)
  • BifrostErrorExtraFields (417-421)
core/schemas/account.go (1)
  • Key (8-17)
core/utils.go (1)
  • IsStreamRequestType (190-192)
core/bifrost_test.go (7)
core/schemas/bifrost.go (2)
  • OpenAI (35-35)
  • BifrostConfig (21-29)
core/bifrost.go (1)
  • Init (86-214)
core/schemas/account.go (1)
  • Account (53-69)
core/schemas/logger.go (2)
  • Logger (28-55)
  • LogLevelError (13-13)
core/logger.go (1)
  • NewDefaultLogger (40-49)
core/schemas/chatcompletions.go (3)
  • BifrostChatRequest (11-18)
  • ChatMessage (346-355)
  • ChatMessageContent (358-361)
core/schemas/provider.go (1)
  • Provider (207-234)
core/providers/gemini/models.go (2)
core/schemas/models.go (1)
  • Model (47-66)
core/providers/gemini/types.go (1)
  • GeminiModel (1255-1269)
core/providers/vertex/models.go (1)
core/schemas/models.go (1)
  • Model (47-66)
core/providers/bedrock/chat.go (1)
core/schemas/chatcompletions.go (1)
  • ChatAssistantMessageToolCall (483-488)
🔇 Additional comments (16)
core/providers/bedrock/chat.go (2)

90-91: LGTM! Proper string pointer initialization for tool calls.

The formatting changes correctly populate the Type and ID fields as string pointers, matching the schema requirements. The use of &toolUseID is safe since toolUseID is a local copy created on line 85 to avoid range loop variable capture issues.


188-188: LGTM! Standard variable declaration.

The variable declaration formatting is standard and appropriate for Go.

core/providers/azure/types.go (1)

20-30: No-op struct formatting change

Only field alignment on AzureModel.CreatedAt; tags and types are unchanged, so there’s no behavioral impact. Looks good as-is.

core/providers/anthropic/chat.go (1)

704-732: Comment wording/whitespace tweak is accurate

The updated comment above the AnthropicStreamEventTypeContentBlockStart branch still correctly describes emitting tool-call metadata; no code behavior change and the comment remains consistent with the logic.

core/providers/cohere/models.go (1)

10-12: Slice init formatting only

Data: make([]schemas.Model, 0, len(response.Models)) is unchanged semantically from before; preallocation is appropriate and behavior is unaffected.

core/providers/gemini/chat.go (1)

70-80: Tool call struct literal spacing only

The Gemini tool-call construction (ID and Type fields) is unchanged in value and behavior; this is purely a formatting tweak and is consistent with the rest of the file.

core/providers/cohere/chat.go (1)

271-278: Formatting-only change in tool call mapping

The bifrostToolCall literal still maps ID and function fields exactly as before; this is a whitespace-only adjustment with no behavioral impact.

core/providers/gemini/gemini.go (1)

490-499: Context cancellation watcher for SpeechStream looks correct

The done-channel + watcher goroutine cleanly forces resp.CloseBodyStream() on ctx.Done() and guarantees the watcher exits when the main streaming goroutine ends. This should unblock the scanner without introducing leaks.

core/providers/cohere/cohere.go (2)

411-420: ChatCompletionStream cancellation watcher is well-scoped

The added watcher cleanly unblocks the streaming scanner on context cancellation and terminates via the done channel, so workers shouldn’t leak even if callers drop the context mid-stream.


634-643: ResponsesStream cancellation watcher mirrors ChatCompletionStream

Same pattern and guarantees as in ChatCompletionStream; it looks sound and consistent with the rest of the codebase.

core/providers/gemini/models.go (1)

15-16: Formatting-only struct literal changes

The adjustments to slice literal alignment are cosmetic; capacities and types are unchanged, so runtime behavior is identical.

Also applies to: 42-43

core/providers/anthropic/anthropic.go (2)

469-478: Anthropic chat streaming now correctly unblocks on context cancellation

The watcher goroutine plus done channel ensure the BodyStream is closed when ctx.Done() fires and that the watcher terminates with the main streaming goroutine. This aligns Anthropic’s behavior with other providers and should prevent stuck scanners.


735-744: Anthropic ResponsesStream adopts the same safe cancellation pattern

Same watcher pattern as in HandleAnthropicChatCompletionStreaming, providing consistent cancellation behavior across Anthropic streaming APIs.

core/providers/vertex/models.go (1)

10-12: Slice preallocation formatting only

Data still preallocates to len(response.Models); behavior unchanged and correct.

core/providers/openai/types.go (1)

128-134: Struct field whitespace cleanup is safe

Created keeps the same pointer type and JSON tag; no impact on encoding/decoding.

core/bifrost_test.go (1)

3-13: New test imports are appropriate and scoped

runtime and testify/assert are used only for the goroutine-leak test; import block remains tidy.

@Pratham-Mishra04
Copy link
Collaborator

Hey @pjcdawkins, thanks for the PR! It’s looking great. There are a few solid Code Rabbit comments to address though. And while you’re at it, please add your update to core/changelog.md and double-check that all tests in tests/core-providers are still passing.

Pratham-Mishra04 and others added 2 commits November 14, 2025 12:48
…extraction (maximhq#789)

## Summary

Enhance changelog generation by adding version information to update labels in the Mintlify documentation.

## Changes

- Renamed `get_changelog_content` to `get_file_content` to better reflect its more general purpose
- Added support for reading version files and trimming whitespace/newlines from version content
- Modified the `append_section` function to include version information from a separate version file in the "description" field of Update tags
- Added error handling for empty changelog content
- Improved file existence checking to prevent errors when files don't exist

## Type of change

- [x] Feature
- [x] Refactor

## Affected areas

- [x] Docs
- [x] Chore/CI

## How to test

Run the changelog generation workflow manually:

```sh
# Test the script with a sample changelog and version file
mkdir -p test/changelog
echo "- New feature added" > test/changelog/changelog.md
echo "1.2.3" > test/changelog/version
.github/workflows/scripts/push-mintlify-changelog.sh
```

## Related issues

Improves documentation automation by including version information in changelog entries.

## Checklist

- [x] I verified the CI pipeline passes locally if applicable
- [x] I updated documentation where needed
This commit addresses two critical concurrency issues identified in the PR review:

1. Worker exit on context cancellation leaving queued requests without responses:
   - Reverted worker loop to simple `for req := range queue` pattern
   - Workers now only exit when queue is closed, ensuring all queued requests are processed
   - This matches the pre-PR implementation and prevents dropped requests during shutdown

2. Streaming plugin pipeline data race:
   - Removed pipeline release for streaming requests to prevent data race
   - Pipeline is acquired once and captured in postHookRunner closure
   - Pipeline will be garbage collected when no longer referenced
   - Trade-off: pipelines are not reused for streaming requests, but data race is avoided

Additional improvements:
- Improved code comments in Shutdown() method for clarity
- Maintained original shutdown sequence: cancel context first (stops provider goroutines),
  then close queues (signals workers), then wait for workers to drain and exit

Reference: maximhq#829 (comment)
Reference: maximhq#829 (comment)

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
core/changelog.md (1)

1-1: Changelog entry accurately captures the fix.

The entry correctly documents the goroutine leak fix across both the worker loop and streaming handlers, addressing the PR objectives. The concise format is appropriate for a changelog.

Minor enhancement: Consider mentioning "context cancellation" for clarity, since that's the mechanism driving the fix—e.g., "fix: Goroutine leaks during context cancellation in worker loop and streaming request handlers". This helps users understand the specific scenario that was fixed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8028533 and 6d849f0.

📒 Files selected for processing (1)
  • core/changelog.md (1 hunks)

Workers were blocking on queue channel receive when the context was cancelled,
causing goroutine leaks that could persist beyond the context timeout.

This fix ensures workers exit promptly when the context is cancelled by:
- Listening for context cancellation in a select statement
- Draining any remaining queued requests and sending cancellation errors
- Preventing callers from hanging indefinitely waiting for responses

This addresses the scenario where context timeout occurs without explicit
Shutdown() being called, which was causing workers to remain blocked waiting
for the queue to be closed.

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
core/bifrost.go (1)

2564-2569: Document expected Shutdown usage to avoid race condition.

The shutdown sequence (cancel context → close queues → wait for workers) is intentional to signal streaming goroutines to unblock. However, this creates a brief window where new requests could be enqueued after context cancellation but before queues are closed.

Recommend documenting that Shutdown() should only be called when no new requests are actively being submitted:

-// Shutdown gracefully stops all workers when triggered.
-// It closes all request channels and waits for workers to exit.
+// Shutdown gracefully stops all workers when triggered.
+// It cancels the bifrost context to signal streaming goroutines to stop,
+// closes all request channels, and waits for workers to exit.
+//
+// Important: Shutdown should be called only when no new requests are being
+// submitted to avoid race conditions where requests could be enqueued after
+// context cancellation but before queue closure.
 func (bifrost *Bifrost) Shutdown() {

This clarifies the contract for callers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d849f0 and c80c51b.

📒 Files selected for processing (1)
  • core/bifrost.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
core/bifrost.go (4)
core/schemas/bifrost.go (9)
  • BifrostError (350-359)
  • ErrorField (368-375)
  • BifrostRequest (140-150)
  • BifrostResponse (243-253)
  • BifrostStream (315-322)
  • RequestType (81-81)
  • BifrostContextKeySelectedKeyID (107-107)
  • BifrostContextKeySelectedKeyName (108-108)
  • BifrostErrorExtraFields (417-421)
core/schemas/provider.go (4)
  • Provider (207-234)
  • ProviderConfig (161-169)
  • CustomProviderConfig (142-148)
  • PostHookRunner (204-204)
core/schemas/account.go (1)
  • Key (8-17)
core/utils.go (1)
  • IsStreamRequestType (190-192)
🔇 Additional comments (2)
core/bifrost.go (2)

2102-2131: Plugin pipeline pooling disabled for streaming to avoid data race (acknowledged trade-off).

The code correctly avoids releasing the plugin pipeline for streaming requests because provider goroutines asynchronously invoke postHookRunner, which would access a pipeline that might have been reset and reused by another request.

This addresses the critical data race flagged in the past review comments. The trade-off (no pooling for streaming) is reasonable given that correctness is prioritized over efficiency.

Note: Streaming requests will allocate a new pipeline for each request instead of reusing from the pool. Under high streaming load, this may increase GC pressure, but it ensures thread safety.


2139-2176: Excellent defensive programming with context-aware sends.

The context-aware response/error/stream sends with timeouts (lines 2140-2176) effectively prevent workers from blocking indefinitely when clients disconnect or stop listening. This ensures that workers can be reclaimed even if clients misbehave.

The 5-second timeout is a reasonable balance between accommodating slow clients and preventing indefinite blocking.

Pratham-Mishra04 and others added 4 commits November 14, 2025 15:59
…nts (maximhq#815)

## Summary

Enhances the Responses API streaming implementation to better align with OpenAI's streaming format, providing more consistent and reliable tool call streaming across providers.

## Changes

- Refactored Anthropic's Responses API streaming to use a stateful accumulator for tracking output items
- Added similar streaming accumulators to Bedrock and Cohere providers
- Implemented proper OpenAI-style lifecycle events (created, in_progress, completed) for all providers
- Improved tool call streaming with proper output indexing and argument accumulation
- Replaced the chat-to-responses conversion approach with native implementations for each provider
- Added comprehensive tests for tool call streaming functionality

## Type of change

- [ ] Bug fix
- [x] Feature
- [x] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [x] Core (Go)
- [ ] Transports (HTTP)
- [x] Providers/Integrations
- [ ] Plugins
- [ ] UI (Next.js)
- [ ] Docs

## How to test

Test the Responses API streaming with tool calls across different providers:

```sh
# Run the tool calls streaming tests
go test ./tests/core-providers -run TestOpenAI/ToolCallsStreamingResponses
go test ./tests/core-providers -run TestAnthropic/ToolCallsStreamingResponses
go test ./tests/core-providers -run TestBedrock/ToolCallsStreamingResponses
go test ./tests/core-providers -run TestCohere/ToolCallsStreamingResponses
```

## Breaking changes

- [ ] Yes
- [x] No

## Related issues

Improves the reliability of tool call streaming in the Responses API.

## Security considerations

No security implications.

## Checklist

- [x] I added/updated tests where appropriate
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
)

## Summary

Added model pricing data support to the Bifrost core, enabling automatic pricing information in model listings and providing a mechanism to sync pricing data from external sources.

## Changes

- Added a `pricingData` field to the Bifrost struct to store pricing information for models
- Created new methods to set and retrieve pricing data for models
- Added a `DataSheetPricingEntry` struct to represent detailed pricing information
- Enhanced `BifrostListModelsResponse` with an `AddPricing` method to include pricing in model listings
- Integrated pricing data with the model catalog framework
- Added a callback mechanism to update pricing data when synced from external sources
- Added `DeploymentName` field to the Model struct

## Type of change

- [x] Feature
- [x] Refactor

## Affected areas

- [x] Core (Go)
- [x] Transports (HTTP)

## How to test

```sh
# Core/Transports
go version
go test ./...

# Test model listing with pricing data
curl -X GET "http://localhost:8000/v1/models" -H "Authorization: Bearer $BIFROST_API_KEY"
```

## Breaking changes

- [x] No

## Related issues

Enhances model listing capabilities with pricing information

## Security considerations

No security implications as this only adds informational data to model responses.

## Checklist

- [x] I added/updated tests where appropriate
- [x] I verified builds succeed (Go and UI)
## Summary

Enhance test coverage workflow with improved integration testing capabilities and fix AWS Bedrock URL encoding issues.

## Changes

- Enhanced test coverage workflow to include UI build, Redis and Weaviate services for integration tests, and plugin rebuilding
- Fixed AWS Bedrock URL encoding by properly handling percent-encoded sequences in query parameters
- Added GORM logger implementations for configstore and logstore to improve database logging
- Updated core-chatbot test module to work with latest Bifrost API changes
- Fixed model catalog pricing lookup method naming for consistency

## Type of change

- [x] Bug fix
- [x] Feature
- [x] Refactor
- [ ] Documentation
- [x] Chore/CI

## Affected areas

- [x] Core (Go)
- [ ] Transports (HTTP)
- [x] Providers/Integrations
- [x] Plugins
- [x] UI (Next.js)
- [ ] Docs

## How to test

```sh
# Test AWS Bedrock URL encoding fix
go test ./core/providers/bedrock -v

# Test the enhanced test coverage workflow
cd .github/workflows
./test-coverage.yml

# Test GORM logger implementations
go test ./framework/configstore -v
go test ./framework/logstore -v

# Test core-chatbot with latest API changes
cd tests/core-chatbot
go test -v
```

## Breaking changes

- [ ] Yes
- [x] No

## Related issues

Fixes AWS Bedrock URL encoding issues with special characters in query parameters.

## Security considerations

The AWS Bedrock URL encoding fix improves request signing security by properly handling percent-encoded sequences.

## Checklist

- [x] I added/updated tests where appropriate
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
@pjcdawkins pjcdawkins marked this pull request as draft November 14, 2025 10:55
pjcdawkins and others added 6 commits November 14, 2025 11:00
When the context is cancelled, the monitor goroutine force-closes the
response body stream to unblock scanner.Scan(). This causes scanner.Err()
to return an error, which was being treated as a real stream error and
logged/sent to clients.

This commit updates all streaming handlers to check if the context was
cancelled before treating scanner errors as failures. When ctx.Err() is
not nil, scanner errors are expected and should be ignored.

Fixed in providers:
- OpenAI: 5 streaming methods
- Anthropic: 2 streaming methods
- Cohere: 2 streaming methods
- Gemini: 2 streaming methods

This prevents spurious error logs and error responses when requests
timeout or are cancelled.

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The previous drain loop used a non-blocking select with a default case
that could exit prematurely if the queue appeared empty. This created a
race condition where requests enqueued between context cancellation and
queue closure would be missed.

Changed to use blocking receive that continues until the queue is actually
closed, ensuring all queued requests receive cancellation errors.

Also added req.Context.Done() check to avoid sending errors to clients
that already cancelled their requests.

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The previous approach of calling resp.CloseBodyStream() to unblock
scanner.Scan() on context cancellation was causing nil pointer panics
in fasthttp's internals.

Investigation revealed that fasthttp does not support context cancellation
for streaming response bodies. The BodyStream() cannot be safely interrupted
once scanner.Scan() is blocking on I/O.

This commit removes:
- Panic recovery in streaming goroutines
- Context monitor goroutines that called CloseBodyStream()
- All CloseBodyStream() calls

The streaming handlers now rely on:
- Context checks between chunks (select case <-ctx.Done())
- Natural TCP connection timeouts for hung streams
- Scanner error handling (ignoring errors when context is cancelled)

This is a simpler, safer approach. While streams may take longer to clean
up if they hang (30-90s TCP timeout vs immediate), this is rare in practice
and avoids the complexity and crashes of forced closes.

Fixed in providers:
- OpenAI: 5 streaming methods
- Anthropic: 2 streaming methods
- Cohere: 2 streaming methods
- Gemini: 2 streaming methods

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Critical bug: Shutdown() was returning early if the context was already
cancelled, which prevented queues from being closed and workers from
exiting.

Scenario that triggered the bug:
1. User passes timeout context to Init(ctx, ...)
2. Timeout expires, bifrost.ctx becomes cancelled
3. Workers detect ctx.Done() and enter drain mode, waiting for queue closure
4. User calls Shutdown() to clean up
5. Shutdown() sees context is already done and returns early
6. Queues never get closed, workers stuck in drain loop forever

Fix:
- Remove early return when context is already cancelled
- Only skip cancel() call if context already cancelled
- Always close queues (with safety check to avoid double-close panic)
- Always wait for workers to exit
- Always run cleanup for MCP manager and plugins

This ensures workers in drain mode always receive the queue closure signal
they're waiting for, regardless of when Shutdown() is called.

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…q#833)

## Summary

Fixed issues with custom keyless providers during initial list models requests, ensuring proper initialization and handling of providers that don't require API keys.

## Changes

- Added a check to skip key retrieval for providers that don't require keys
- Implemented auto-initialization of providers when they're not yet initialized but have a valid config
- Added support for keyless model listing in Anthropic, Cohere, Gemini, and OpenAI providers
- Updated provider implementations to handle keyless configurations properly

## Type of change

- [x] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Documentation
- [ ] Chore/CI

## Affected areas

- [x] Core (Go)
- [x] Transports (HTTP)
- [x] Providers/Integrations
- [ ] Plugins
- [ ] UI (Next.js)
- [ ] Docs

## How to test

Test listing models with custom keyless providers:

```sh
# Core/Transports
go version
go test ./...

# Test with a custom keyless provider configuration
curl -X GET "http://localhost:8000/v1/models" -H "Authorization: Bearer your_token"
```

## Breaking changes

- [ ] Yes
- [x] No

## Related issues

Fixes issues with custom keyless providers failing to initialize or list models properly.

## Security considerations

This change maintains the security model by properly handling keyless provider configurations without exposing sensitive information.

## Checklist

- [x] I read `docs/contributing/README.md` and followed the guidelines
- [x] I added/updated tests where appropriate
- [x] I updated documentation where needed
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
Add configurable stream inactivity timeout to NetworkConfig to allow
customization of the timeout duration for detecting and force-closing
hung streams. Previously hardcoded to 60 seconds.

Changes:
- Add StreamInactivityTimeoutInSeconds field to NetworkConfig
- Add DefaultStreamInactivityTimeoutInSeconds constant (60 seconds)
- Update CheckAndSetDefaults to set default if not provided
- Update all streaming methods across providers to use configurable value:
  - OpenAI: 5 streaming methods
  - Anthropic: 2 streaming methods (including shared function)
  - Cohere: 2 streaming methods
  - Gemini: 2 streaming methods

The timeout can now be configured per provider via the NetworkConfig
in the provider configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
pjcdawkins and others added 2 commits November 14, 2025 12:42
Update all provider calls to shared OpenAI and Anthropic streaming
functions to pass the new inactivityTimeoutSeconds parameter.

This fixes build errors after adding the configurable timeout parameter
to the shared streaming function signatures.

Changes:
- Update function signatures for shared streaming handlers to accept
  timeout parameter
- Add provider.networkConfig.StreamInactivityTimeoutInSeconds parameter
  to all streaming function calls across providers:
  - cerebras, groq, ollama, openrouter, parasail, sgl
  - azure, mistral, perplexity, gemini, vertex
- Fix timeout checks in OpenAI shared functions to use parameter instead
  of accessing undefined provider variable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
return
}

// Track last activity time for inactivity timeout detection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pjcdawkins can you have a common util for this? also lets add a field in the UI as well for stream activity timeout in network config

Copy link
Collaborator

@pjcdawkins lets pls also add the inactivity check in speech and transcription streaming

@pjcdawkins
Copy link
Author

Good points - I set this as a draft because it's a relatively hard problem and I'll come back to it soon! But it seems important. I think it'll need a few more tests to demonstrate the problem(s) and fixes, and some consolidation as you say.

@Pratham-Mishra04
Copy link
Collaborator

Yes sure no worries, lmk if you need a hand😄

@akshaydeo
Copy link
Contributor

also could you rebase this main with our latest main. we had to nuke one file from the history - which had to be done via rewrite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Goroutines leak on context cancellation

5 participants