Skip to content

Conversation

@Bytechoreographer
Copy link
Contributor

@Bytechoreographer Bytechoreographer commented Nov 4, 2025

Feature: Configurable Ping Max Tokens

Overview

Changed the max_tokens parameter in OpenAI-compatible model validation (ping) requests from a hard-coded value to an environment variable-based configuration, improving flexibility and meeting different provider requirements.

Background

Some OpenAI-compatible model providers—such as OpenRouter (when wrapping other vendors or offering models like gpt-5-codex)—either require the max_tokens parameter in validation requests to be at least 16 (resulting in validation failure if not met) or enforce a minimum output token limit overall (returning a 400 error for very small values like 5). However, the previous implementation hard-coded this max_tokens value as 5, which lacked flexibility.
image

Changes

1. Configuration File Changes

File: python/dify_plugin/config/config.py

Added a new environment variable configuration in the DifyPluginEnv class:

PING_MAX_TOKENS: int = Field(
    default=16,
    description="Maximum tokens for model validation ping request. Some providers require at least 16 tokens.",
)

Notes:

  • Default value set to 16, meeting the minimum requirement for most providers
  • Can be customized via the .env file
  • Clear naming with detailed description

2. Implementation Code Changes

File: python/dify_plugin/interfaces/model/openai_compatible/llm.py

2.1 Import Configuration Class

Added at the beginning of the file:

from dify_plugin.config.config import DifyPluginEnv

2.2 Modified validate_credentials Method

In the validate_credentials method:

  1. Added configuration reading logic:
# Load ping max tokens from environment variable
config = DifyPluginEnv()
ping_max_tokens = config.PING_MAX_TOKENS
  1. Replaced hard-coded max_tokens value:
# Before
data = {"model": credentials.get("endpoint_model_name", model), "max_tokens": 5}

# After
data = {"model": credentials.get("endpoint_model_name", model), "max_tokens": ping_max_tokens}
  1. Also fixed the hard-coded value in stream mode:
# Before
if stream_mode_auth == "use":
    data["stream"] = True
    data["max_tokens"] = 10

# After
if stream_mode_auth == "use":
    data["stream"] = True
    data["max_tokens"] = ping_max_tokens

Usage

1. Using Default Value (Recommended)

No configuration needed. The system will use the default value of 16.

2. Custom Configuration

Add to the .env file in the project root directory:

PING_MAX_TOKENS=32

Or set other values based on actual needs, for example:

# Some special providers may require larger values
PING_MAX_TOKENS=64

Compatibility Notes

  • Backward Compatible: Changes are fully backward compatible and do not affect existing functionality
  • Default Behavior Change: Default value changed from 5 to 16, meeting minimum requirements for most providers
  • Flexible Configuration: Supports customization via environment variables for different scenarios

Related Files

  • python/dify_plugin/config/config.py - Configuration definition
  • python/dify_plugin/interfaces/model/openai_compatible/llm.py - Implementation code

Modification Date

2025-01-06

Pull Request Checklist

Thank you for your contribution! Before submitting your PR, please make sure you have completed the following checks:

Compatibility Check

  • [✔] I have checked whether this change affects the backward compatibility of the plugin declared in README.md
  • [✔] I have checked whether this change affects the forward compatibility of the plugin declared in README.md
  • [✔] If this change introduces a breaking change, I have discussed it with the project maintainer and specified the release version in the README.md
  • [✔] I have described the compatibility impact and the corresponding version number in the PR description
  • [✔] I have checked whether the plugin version is updated in the README.md

Available Checks

  • [✔] Code has passed local tests
  • [✔] Relevant documentation has been updated (if necessary)

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Bytechoreographer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request increases the max_tokens value used during credential validation for certain language models. This resolves an issue with providers that require a higher minimum token limit, preventing validation failures.

Highlights

  • Increased max_tokens: The max_tokens parameter in the validate_credentials method has been increased from 5 and 10 to 32 for certain models like gpt-5-codex.
  • Purpose: This change addresses an issue where some providers (e.g., OpenRouter) enforce a minimum output token limit, returning a 400 error for very small values.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly increases the max_tokens for credential validation to support providers with minimum token limits. My review includes suggestions to replace the hardcoded value 32 with a named constant. This will improve the code's readability and make it easier to maintain in the future.

@Bytechoreographer Bytechoreographer force-pushed the feat/increase_ping_max_tokens branch from 63f55f0 to 4dc76a8 Compare November 6, 2025 04:49
…pt-5-codex in ping test, add PING_MAX_TOKENS configuration for model validation ping requests
@Bytechoreographer Bytechoreographer force-pushed the feat/increase_ping_max_tokens branch from 4dc76a8 to a5c44ce Compare November 6, 2025 04:55
@Bytechoreographer Bytechoreographer changed the title fix: increase max_tokens from 5 to 32 for some specific models like gpt-5-codex feat: increase max_tokens from 5 to 32 for some specific models like gpt-5-codex Nov 6, 2025
@Bytechoreographer Bytechoreographer changed the title feat: increase max_tokens from 5 to 32 for some specific models like gpt-5-codex feat: Changed the max_tokens parameter in OpenAI-compatible model validation (ping) requests from a hard-coded value to an environment variable-based configuration, improving flexibility and meeting different provider requirements. Nov 6, 2025
@Bytechoreographer Bytechoreographer changed the title feat: Changed the max_tokens parameter in OpenAI-compatible model validation (ping) requests from a hard-coded value to an environment variable-based configuration, improving flexibility and meeting different provider requirements. feat: Configurable Ping Max Tokens Nov 6, 2025
@Bytechoreographer
Copy link
Contributor Author

GPT-5.1 got the same problem. It seems necessary to incease max_tokens in the ping request for the coming LLMs.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant