You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[WIP] Remove Instruct/Chat versions of models & introduce a new ChatTemplate API, fix Anthropic API (#820)
This PR should significantly reduce the number of user-facing classes we
have in Guidance, and reduce subtle bugs introduced by using a
mis-specified Chat Template (models currently silently default to the
ChatML syntax, which many of the latest models don't adhere to). It
should also make it easier for users to add new models to guidance,
either via PR or in their own codebases.
Before:
```python
from guidance.models.transformers import Transformers, TransformersChat
class Llama(Transformers):
pass
# Users have to do this for most chat models in guidance
class LlamaChat(TransformersChat, Llama):
def get_role_start(self, role_name, **kwargs):
if role_name == "system":
return self._system_prefex + "<<SYS>>\n"
elif role_name == "user":
if str(self).endswith("\n<</SYS>>\n\n"):
return ""
else:
return "[INST] "
else:
return " "
def get_role_end(self, role_name=None):
if role_name == "system":
return "\n<</SYS>>\n\n"
elif role_name == "user":
return " [/INST]"
else:
return " "
lm = LlamaChat(path_to_llama)
```
After:
```python
from guidance.models import Transformers
lm = Transformers(path_to_llama) # automagically works
```
If you're using a rare model and the auto import doesn't automatically
work...
After pt2:
```python
# users can copy paste from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer_config.json#L12
llama2_template = "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}"
lm = Transformers(path_to_llama, chat_template=llama2_template)
```
or, in the worst case for maximal robustness and customizability:
```python
from guidance._chat import ChatTemplate, UnsupportedRoleException
class Llama2ChatTemplate(ChatTemplate):
template_str = llama2_template
def get_role_start(self, role_name):
if role_name == "system":
return "[INST] <<SYS>>\n"
elif role_name == "user":
return "<s>[INST]"
elif role_name == "assistant":
return " "
else:
raise UnsupportedRoleException(role_name, self)
def get_role_end(self, role_name=None):
if role_name == "system":
return "\n<</SYS>"
elif role_name == "user":
return " [/INST]"
elif role_name == "assistant":
return "</s>"
else:
raise UnsupportedRoleException(role_name, self)
lm = Transformers(path_to_llama, chat_template=Llama2ChatTemplate)
```
The first big change is the removal of the `Chat` and `Instruct` mixins,
and an introduction of a new `guidance._chat.ChatTemplate` class, which
handles the same responsibilities as those mixins used to.
Users can construct a subclass of `ChatTemplate` and pass it to models
with a new `chat_template` argument (defaulted to None).
The way this works for local models is to leverage the `chat_template`
property in `huggingface transformers` and `llamacpp`'s GGUF files. When
a user tries to load a model, guidance now follows the following order
of operations:
1. See if the user passed in a `ChatTemplate` -- if so, use that
directly.
2. If the user passed string "chat_template", set that as a
"template_str". If the user did not pass in anything, set the
"template_str" based on metadata from the huggingface.AutoTokenizer or
gguf metadata fields.
3. Check the `template_str` against a local cache in guidance which
maintains template converters for the most popular models on
huggingface/in llama.cpp. We index this cache based on the actual
chat_template string, so any model that uses one of these chat templates
-- even if it isn't explicitly listed in the documentation -- will
automatically load the right guidance class.
4. If we don't have anything in the cache, try to automatically convert
the jinja2 template into the new guidance._chat.ChatTemplate syntax.
Warn the user if we attempt this. [NOTE: This is not yet implemented and
may come in a future PR.]
5. Default to the `ChatML` syntax, with a warning to the user.
Currently this PR updates the following user facing `guidance.Models`
classes:
- Transformers (removing TransformersChat and the Llama/Mistral
subclasses)
- LlamaCpp (removing LlamaCppChat and Mistral subclasses)
- Anthropic
For now, `Anthropic` should be representative of how grammarless classes
will work. I wanted to start with OpenAI, but many other guidance.models
classes inherit from OpenAI, so I'm saving that for later. Also while I
was at it I upgraded the `Anthropic` class to use their latest SDK, so
`guidance.models.Anthropic` should now work with the latest Claude3
models.
# TODO
A decent amount left to do here. In no particular order...
1. Unrelated to this change, but guidance cannot properly handle
`llama3` or `phi-3`'s true tokenizers/chat templates. We need to fix
that independently.
2. Add better warnings to users when we fall back to using the ChatML
template.
3. Extend support through the rest of the guidance.models classes
(mostly just remote ones left, starting with OpenAI).
4. Write out more templates for popular models in the ChatTemplateCache,
and also add an alias system so that we can look up models in the cache
by common names (e.g. "llama3").
5. Add a deprecation warning to people trying to use very old models on
`Anthropic`.
6. Much more testing and documentation. We should, for example, add
documentation on how to import/initialize a new ChatTemplate and use it
for your own models.
7. Write the auto-converter from huggingface `jinja2` to guidance
ChatTemplate. A battery of unit tests here that compare against the
original `transformers.apply_chat_template` method would make this more
robust. Can be in a future PR as this is complex logic. A start to this
was attempted in #791 by
@ilmarinen, and we could eventually pull this in and expand its
coverage.
8. Probably get rid of the folders in `guidance.models.llama_cpp` and
`guidance.models.transformers` because we don't need to maintain a bunch
of subclasses for them anymore.
Would appreciate any and all feedback, particularly on the logical flow
and new user facing (simpler) API. @marcotcr@paulbkoch@slundberg@riedgar-ms@hudson-ai
---------
Co-authored-by: Richard Edgar (Microsoft) <[email protected]>
Co-authored-by: Paul Koch <[email protected]>
0 commit comments