Skip to content

Conversation

@mikek-mlcommons
Copy link
Collaborator

…bset.csv to airr_official_1.0_en_demo_prompt_set_release.csv

Changed name to add "en" for "English" and "demo" in lower case.

…bset.csv to airr_official_1.0_en_demo_prompt_set_release.csv

Changed name to add "en" for "English" and "demo" in lower case.
@mikek-mlcommons mikek-mlcommons requested a review from a team as a code owner March 17, 2025 18:34
@github-actions
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@rogthefrog
Copy link

rogthefrog commented Mar 17, 2025

@mikek-mlcommons I hate to do this again, but the file shouldn't have the locale there.

The prompt_sets.py module will show you the scheme.

    "demo": {
        "en_us": "airr_official_1.0_demo_prompt_set_release",
        "fr_fr": "airr_official_1.0_demo_fr_fr_prompt_set_release",
    },

We're not specifying the locale for English, though we could. If we did, we'd use "en_us" (in lowercase) right after the prompt set type ("demo"), so it'd be airr_official_1.0_demo_en_us_prompt_set_release.csv

@wpietri
Copy link

wpietri commented Mar 18, 2025

We're not specifying the locale for English, though we could. If we did, we'd use "en_us" (in lowercase) right after the prompt set type ("demo"), so it'd be airr_official_1.0_demo_en_us_prompt_set_release.csv

Good catch. Yes, it would probably be better to move toward using locale for all prompt set filenames.

@mikek-mlcommons
Copy link
Collaborator Author

I'm OK with adding locale, but I don't know if we're going to go to that level of granularity for any of the prompt sets any time soon and "fr_fr" looks odd, though I know why it's like that. Is there a deeper problem (other than potential future compatibility) with only having the top level name in the filename?

@rogthefrog
Copy link

I'm OK with adding locale, but I don't know if we're going to go to that level of granularity for any of the prompt sets any time soon and "fr_fr" looks odd, though I know why it's like that. Is there a deeper problem (other than potential future compatibility) with only having the top level name in the filename?

We're just using the standard ISO 639 and 3166 locale codes (e.g.en_US), lowercased, everywhere. Using just the language in this case only introduces a variation from the standard that needs to be justified, and I don't see a strong reason to deviate.

If we ever have e.g. Arabic prompt sets, the region will matter more.

@wpietri
Copy link

wpietri commented Mar 18, 2025

My understanding is also that our current prompt sets really do honor the locale. E.g., our English prompts are American English, our French prompts are France French. Is that right, @bollacker?

@bollacker
Copy link
Collaborator

My understanding is also that our current prompt sets really do honor the locale. E.g., our English prompts are American English, our French prompts are France French. Is that right, @bollacker?

That is correct.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants