Skip to content

New schema: Add chat schema#679

Merged
sabilmakbar merged 4 commits intoSEACrowd:masterfrom
patrickamadeus:feat/chat-schema
May 31, 2024
Merged

New schema: Add chat schema#679
sabilmakbar merged 4 commits intoSEACrowd:masterfrom
patrickamadeus:feat/chat-schema

Conversation

@patrickamadeus
Copy link
Copy Markdown
Collaborator

Adding new chat schema to support #635

Comment thread seacrowd/utils/schemas/chat.py Outdated
{
"id": datasets.Value("string"),
"input": datasets.Sequence({
"role": datasets.ClassLabel(names=["system", "user", "assistant"]),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the role should be implemented as string instead of ClassLabel due to 2 reasons:

  1. There might be some cases where the role can be multiple for a single-sequence dialogues
  2. The actual data (when the examples are generated) will result in an integer instead of string

Comment thread seacrowd/utils/schemas/chat.py Outdated
# 2. defining meta as dict of key with intended colname meta and its val with dataset.Features class
# in `_info` Dataloader method then populate it with the values in `_general_examples` Dataloader method
}
) No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add whitespace at EoF

Comment thread seacrowd/utils/schemas/__init__.py Outdated
"video_features",
"tod_features",
]
] No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add whitespace at EoF

@sabilmakbar
Copy link
Copy Markdown
Collaborator

note to Holy & Sam:

btw we might want to revisit the ToD (Task-Oriented Dialogue) & DS (Dialogue System) tasks later on whether we should use this schema too (since HF tokenizers already support the format of chat -- specifically on the input data, which has the same schema as HF Chat).

@holylovenia
Copy link
Copy Markdown
Contributor

note to Holy & Sam:

btw we might want to revisit the ToD (Task-Oriented Dialogue) & DS (Dialogue System) tasks later on whether we should use this schema too (since HF tokenizers already support the format of chat -- specifically on the input data, which has the same schema as HF Chat).

Hi @sabilmakbar, sorry for the late response. Replied on #635.

@holylovenia
Copy link
Copy Markdown
Contributor

Hi @patrickamadeus and @sabilmakbar, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then.

add whitespace at EoF
change the `role` field schema from ClassLabel to string
@sabilmakbar
Copy link
Copy Markdown
Collaborator

did the changes bcs the initial assignee had no response

@sabilmakbar sabilmakbar merged commit 0665774 into SEACrowd:master May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants