Skip to content

Commit 918ec00

Browse files
Merge pull request #569 from huggingface/inference-client-update
Updated InferenceClient calls
2 parents f3b43f0 + cd531d3 commit 918ec00

File tree

6 files changed

+55
-78
lines changed

6 files changed

+55
-78
lines changed

units/en/unit1/dummy-agent-library.mdx

Lines changed: 50 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ You probably wouldn't use these in production, but they will serve as a good **s
1212

1313
After this section, you'll be ready to **create a simple Agent** using `smolagents`
1414

15-
And in the following Units we will also use other AI Agent libraries like `LangGraph`, `LangChain`, and `LlamaIndex`.
15+
And in the following Units we will also use other AI Agent libraries like `LangGraph`, and `LlamaIndex`.
1616

1717
To keep things simple we will use a simple Python function as a Tool and Agent.
1818

@@ -29,45 +29,13 @@ import os
2929
from huggingface_hub import InferenceClient
3030

3131
## You need a token from https://hf.co/settings/tokens, ensure that you select 'read' as the token type. If you run this on Google Colab, you can set it up in the "settings" tab under "secrets". Make sure to call it "HF_TOKEN"
32-
os.environ["HF_TOKEN"]="hf_xxxxxxxxxxxxxx"
32+
# HF_TOKEN = os.environ.get("HF_TOKEN")
3333

34-
client = InferenceClient(provider="hf-inference", model="meta-llama/Llama-3.3-70B-Instruct")
35-
# if the outputs for next cells are wrong, the free model may be overloaded. You can also use this public endpoint that contains Llama-3.2-3B-Instruct
36-
# client = InferenceClient("https://jc26mwg228mkj8dw.us-east-1.aws.endpoints.huggingface.cloud")
34+
client = InferenceClient(model="meta-llama/Llama-4-Scout-17B-16E-Instruct")
3735
```
3836

39-
```python
40-
output = client.text_generation(
41-
"The capital of France is",
42-
max_new_tokens=100,
43-
)
44-
45-
print(output)
46-
```
47-
output:
48-
```
49-
Paris. The capital of France is Paris. Paris, the City of Light, is known for its stunning architecture, art museums, fashion, and romantic atmosphere. It's a must-visit destination for anyone interested in history, culture, and beauty. The Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral are just a few of the many iconic landmarks that make Paris a unique and unforgettable experience. Whether you're interested in exploring the city's charming neighborhoods, enjoying the local cuisine.
50-
```
51-
As seen in the LLM section, if we just do decoding, **the model will only stop when it predicts an EOS token**, and this does not happen here because this is a conversational (chat) model and **we didn't apply the chat template it expects**.
52-
53-
If we now add the special tokens related to the <a href="https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct">Llama-3.3-70B-Instruct model</a> that we're using, the behavior changes and it now produces the expected EOS.
37+
We use the `chat` method since is a convenient and reliable way to apply chat templates:
5438

55-
```python
56-
prompt="""<|begin_of_text|><|start_header_id|>user<|end_header_id|>
57-
The capital of France is<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
58-
output = client.text_generation(
59-
prompt,
60-
max_new_tokens=100,
61-
)
62-
63-
print(output)
64-
```
65-
output:
66-
```
67-
The capital of France is Paris.
68-
```
69-
70-
Using the "chat" method is a much more convenient and reliable way to apply chat templates:
7139
```python
7240
output = client.chat.completions.create(
7341
messages=[
@@ -78,11 +46,14 @@ output = client.chat.completions.create(
7846
)
7947
print(output.choices[0].message.content)
8048
```
49+
8150
output:
51+
8252
```
83-
The capital of France is Paris.
53+
Paris.
8454
```
85-
The chat method is the RECOMMENDED method to use in order to ensure a smooth transition between models, but since this notebook is only educational, we will keep using the "text_generation" method to understand the details.
55+
56+
The chat method is the RECOMMENDED method to use in order to ensure a smooth transition between models.
8657

8758
## Dummy Agent
8859

@@ -133,29 +104,19 @@ Final Answer: the final answer to the original input question
133104
Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. """
134105
```
135106

136-
Since we are running the "text_generation" method, we need to apply the prompt manually:
137-
```python
138-
prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
139-
{SYSTEM_PROMPT}
140-
<|eot_id|><|start_header_id|>user<|end_header_id|>
141-
What's the weather in London ?
142-
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
143-
"""
144-
```
107+
We need to append the user instruction after the system prompt. This happens inside the `chat` method. We can see this process below:
145108

146-
We can also do it like this, which is what happens inside the `chat` method :
147109
```python
148-
messages=[
110+
messages = [
149111
{"role": "system", "content": SYSTEM_PROMPT},
150-
{"role": "user", "content": "What's the weather in London ?"},
151-
]
152-
from transformers import AutoTokenizer
153-
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")
112+
{"role": "user", "content": "What's the weather in London?"},
113+
]
154114

155-
tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True)
115+
print(messages)
156116
```
157117

158-
The prompt now is :
118+
The prompt now is:
119+
159120
```
160121
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
161122
Answer the following questions as best you can. You have access to the following tools:
@@ -196,15 +157,17 @@ What's the weather in London ?
196157
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
197158
```
198159

199-
Let's decode!
160+
Let's call the `chat` method!
161+
200162
```python
201-
output = client.text_generation(
202-
prompt,
203-
max_new_tokens=200,
163+
output = client.chat.completions.create(
164+
messages=messages,
165+
stream=False,
166+
max_tokens=200,
204167
)
205-
206-
print(output)
168+
print(output.choices[0].message.content)
207169
```
170+
208171
output:
209172

210173
````
@@ -222,19 +185,22 @@ Final Answer: The current weather in London is partly cloudy with a temperature
222185
````
223186

224187
Do you see the issue?
188+
225189
> At this point, the model is hallucinating, because it's producing a fabricated "Observation" -- a response that it generates on its own rather than being the result of an actual function or tool call.
226190
> To prevent this, we stop generating right before "Observation:".
227191
> This allows us to manually run the function (e.g., `get_weather`) and then insert the real output as the Observation.
228192
229193
```python
230-
output = client.text_generation(
231-
prompt,
232-
max_new_tokens=200,
194+
# The answer was hallucinated by the model. We need to stop to actually execute the function!
195+
output = client.chat.completions.create(
196+
messages=messages,
197+
max_tokens=150,
233198
stop=["Observation:"] # Let's stop before any actual function is called
234199
)
235200

236-
print(output)
201+
print(output.choices[0].message.content)
237202
```
203+
238204
output:
239205

240206
````
@@ -249,8 +215,9 @@ Action:
249215
Observation:
250216
````
251217

252-
Much Better!
253-
Let's now create a dummy get weather function. In a real situation, you would likely call an API.
218+
Much Better!
219+
220+
Let's now create a **dummy get weather function**. In a real situation you could call an API.
254221

255222
```python
256223
# Dummy function
@@ -259,23 +226,33 @@ def get_weather(location):
259226

260227
get_weather('London')
261228
```
229+
262230
output:
231+
263232
```
264233
'the weather in London is sunny with low temperatures. \n'
265234
```
266235

267-
Let's concatenate the base prompt, the completion until function execution and the result of the function as an Observation and resume generation.
236+
Let's concatenate the system prompt, the base prompt, the completion until function execution and the result of the function as an Observation and resume generation.
268237

269238
```python
270-
new_prompt = prompt + output + get_weather('London')
271-
final_output = client.text_generation(
272-
new_prompt,
273-
max_new_tokens=200,
239+
messages=[
240+
{"role": "system", "content": SYSTEM_PROMPT},
241+
{"role": "user", "content": "What's the weather in London ?"},
242+
{"role": "assistant", "content": output.choices[0].message.content + get_weather('London')},
243+
]
244+
245+
output = client.chat.completions.create(
246+
messages=messages,
247+
stream=False,
248+
max_tokens=200,
274249
)
275250

276-
print(final_output)
251+
print(output.choices[0].message.content)
277252
```
253+
278254
Here is the new prompt:
255+
279256
```text
280257
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
281258
Answer the following questions as best you can. You have access to the following tools:

units/es/unit1/dummy-agent-library.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Probablemente no usarías estos en producción, pero servirán como un buen **pu
1212

1313
Después de esta sección, estarás listo para **crear un Agente simple** usando `smolagents`
1414

15-
Y en las siguientes Unidades también utilizaremos otras bibliotecas de Agentes de IA como `LangGraph`, `LangChain` y `LlamaIndex`.
15+
Y en las siguientes Unidades también utilizaremos otras bibliotecas de Agentes de IA como `LangGraph` y `LlamaIndex`.
1616

1717
Para mantener las cosas simples, utilizaremos una función simple de Python como Herramienta y Agente.
1818

units/ko/unit1/dummy-agent-library.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
이 섹션을 마치면 `smolagents`를 사용하여 **간단한 에이전트를 만들** 준비가 될 것입니다.
1414

15-
이어지는 Unit에서는 `LangGraph`, `LangChain`, `LlamaIndex`와 같은 다른 AI 에이전트 라이브러리도 사용해 볼 예정입니다.
15+
이어지는 Unit에서는 `LangGraph`, `LlamaIndex`와 같은 다른 AI 에이전트 라이브러리도 사용해 볼 예정입니다.
1616

1717
간단하게 하기 위해 도구와 에이전트로 단순한 Python 함수를 사용할 것입니다.
1818

units/ru-RU/unit1/dummy-agent-library.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
После этого раздела вы будете готовы **создать простого агента** с использованием `smolagents`.
1414

15-
В следующих разделах мы также будем использовать другие библиотеки AI Агентов, такие как `LangGraph`, `LangChain` и `LlamaIndex`.
15+
В следующих разделах мы также будем использовать другие библиотеки AI Агентов, такие как `LangGraph` и `LlamaIndex`.
1616

1717
Для простоты мы будем использовать простую функцию Python как Инструмент и Агент.
1818

units/vi/unit1/dummy-agent-library.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Những công cụ này có thể không dùng cho production, nhưng sẽ là *
1212

1313
Sau phần này, bạn sẽ sẵn sàng **tạo Agent đơn giản** bằng `smolagents`.
1414

15-
Ở các chương tiếp theo, ta cũng sẽ dùng các thư viện AI agent khác như `LangGraph`, `LangChain``LlamaIndex`.
15+
Ở các chương tiếp theo, ta cũng sẽ dùng các thư viện AI agent khác như `LangGraph``LlamaIndex`.
1616

1717
Để đơn giản hóa, ta sẽ dùng hàm Python cơ bản làm Tool và Agent.
1818

units/zh-CN/unit1/dummy-agent-library.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
在本节之后,你将准备好**使用 `smolagents` 创建一个简单的智能体**
1414

15-
在接下来的单元中,我们还将使用其他 AI 智能体库,如 `LangGraph``LangChain``LlamaIndex`
15+
在接下来的单元中,我们还将使用其他 AI 智能体库,如 `LangGraph``LlamaIndex`
1616

1717
为了保持简单,我们将使用一个简单的 Python 函数作为工具和智能体。
1818

0 commit comments

Comments
 (0)