Skip to content

Update OpenAI model list #8589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ZachParent
Copy link

Problem

  • I noticed that the list of OpenAI models is out of date. For example, gpt-4.1 family models are not present
  • This causes failures when attempting to finetune, for example using BootstrapFinetune

Solution

  • I pulled the currently available models using
from openai import OpenAI

client = OpenAI()
models = client.models.list()
models_ls = list(models)

open_models = [model for model in models_ls if ":" not in model.id]
open_model_ids = sorted([model.id for model in open_models])
open_model_ids
  • I updated the list of models in clients/openai.py to this list

Testing

  • I would like to test this but I'm not sure how. We could set up a test which queries the models.list() endpoint and checks that it matches this list, but this may be brittle and subject to OpenAI changes

Alternatives

@okhat
Copy link
Collaborator

okhat commented Jul 31, 2025

Thank you! But this is adding many models that can't be finetuned, like dalle. We should probably switch to something more sustainable here...

@ZachParent
Copy link
Author

Understood, and agreed! Let me know if you have a direction to look into. I don't see a way in the OpenAI docs to filter models by capability (like finetuning). Maybe a third party provides an API with this list? Otherwise, I'm not sure what's possible beyond hard-coding.

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

The reason for maintaining this hardcoded list is to make it possible to infer the model provider if there is no prefix provided:

def infer_provider(self) -> Provider:
. I think we can actually force the users to specify model name in the format {provider_name}/{model_name}, the gain of being able to write gpt-5 instead of openai/gpt-5 doesn't quite worth the effort of maintaining this hardcode list IMO.

@okhat @dilarasoylu Let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants