Skip to content

Update max_tokens default from 4096 to 65535 #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 18, 2025

Conversation

devin-ai-integration[bot]
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Jul 18, 2025

Update token defaults from 4096/1024 to 65535

Summary

Updated default token limits across both Python and Node.js SDKs to increase the maximum tokens from 4096/1024 to 65535. This change affects:

  • Python SDK: GenerationConfig.max_tokens default (4096 → 65535)
  • Node.js SDK: Fine-tuning max_tokens (4096 → 65535) and max_new_tokens (1024 → 65535)

The changes allow users to process longer content by default without requiring explicit token limit configuration.

Review & Testing Checklist for Human

  • API Compatibility: Verify that the VLM API backend actually supports 65535 tokens and doesn't reject requests with this limit
  • Integration Testing: Run integration tests (npm run test:integration for Node.js, full test suite for Python) to confirm end-to-end functionality works with new defaults
  • Cost Impact Assessment: Consider whether the 16x increase in default token limits could lead to unexpected cost increases for users, and whether documentation or warnings are needed
  • Backward Compatibility: Evaluate if this constitutes a breaking change that might affect existing user code or requires deprecation notices

Recommended Test Plan: Create a test request with content that would exceed the old limits (>4096 tokens) but stay within the new limit, and verify it processes successfully through the full API pipeline.


Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit  
        L3["Context/No Edit"]:::context
    end

    PythonTypes["vlmrun/client/types.py<br/>GenerationConfig.max_tokens"]:::major-edit
    NodeFinetuning["src/client/fine_tuning.ts<br/>max_tokens & max_new_tokens"]:::major-edit
    
    PythonClient["Python SDK Client"]:::context
    NodeClient["Node.js SDK Client"]:::context
    VlmAPI["VLM API Backend"]:::context
    
    PythonClient --> PythonTypes
    NodeClient --> NodeFinetuning
    PythonTypes --> VlmAPI
    NodeFinetuning --> VlmAPI
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB  
    classDef context fill:#FFFFFF
Loading

Notes

- Updated GenerationConfig.max_tokens default value to 65535
- This increases the maximum token limit for generation requests

Co-Authored-By: [email protected] <[email protected]>
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

- Version increment to trigger automatic deployment on merge
- Includes max_tokens default update from 4096 to 65535

Co-Authored-By: [email protected] <[email protected]>
@dineshreddy91 dineshreddy91 merged commit b5a11de into main Jul 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants