New schema: Add chat schema#679
Conversation
| { | ||
| "id": datasets.Value("string"), | ||
| "input": datasets.Sequence({ | ||
| "role": datasets.ClassLabel(names=["system", "user", "assistant"]), |
There was a problem hiding this comment.
the role should be implemented as string instead of ClassLabel due to 2 reasons:
- There might be some cases where the role can be multiple for a single-sequence dialogues
- The actual data (when the examples are generated) will result in an integer instead of string
| # 2. defining meta as dict of key with intended colname meta and its val with dataset.Features class | ||
| # in `_info` Dataloader method then populate it with the values in `_general_examples` Dataloader method | ||
| } | ||
| ) No newline at end of file |
There was a problem hiding this comment.
nit: add whitespace at EoF
| "video_features", | ||
| "tod_features", | ||
| ] | ||
| ] No newline at end of file |
There was a problem hiding this comment.
nit: add whitespace at EoF
|
note to Holy & Sam: btw we might want to revisit the ToD (Task-Oriented Dialogue) & DS (Dialogue System) tasks later on whether we should use this schema too (since HF tokenizers already support the format of |
Hi @sabilmakbar, sorry for the late response. Replied on #635. |
|
Hi @patrickamadeus and @sabilmakbar, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then. |
add whitespace at EoF
change the `role` field schema from ClassLabel to string
|
did the changes bcs the initial assignee had no response |
Adding new
chatschema to support #635