You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the llama_eval() call computes all logits, not just the last one
77
+
--vocab_only VOCAB_ONLY
78
+
only load the vocabulary, no weights
79
+
--use_mlock USE_MLOCK
80
+
force system to keep model in RAM
81
+
--embedding EMBEDDING
82
+
embedding mode only
83
+
--n_predict N_PREDICT
84
+
Number of tokens to predict
85
+
--n_threads N_THREADS
86
+
Number of threads
87
+
--repeat_last_n REPEAT_LAST_N
88
+
Last n tokens to penalize
89
+
--top_k TOP_K top_k
90
+
--top_p TOP_P top_p
91
+
--temp TEMP temp
92
+
--repeat_penalty REPEAT_PENALTY
93
+
repeat_penalty
94
+
--n_batch N_BATCH batch size for prompt processing
54
95
55
-
model = Model(ggml_model='./models/gpt4all-model.bin', n_ctx=512)
56
-
model.generate("Once upon a time, ", n_predict=55, new_text_callback=new_text_callback, n_threads=8)
57
96
```
58
-
If you don't want to use the `callback`, you can get the results from the `generate` method once the inference is finished:
97
+
# Tutorial
98
+
99
+
### Quick start
100
+
A simple `Pythonic` API is built on top of `llama.cpp` C/C++ functions. You can call it from Python as follows:
59
101
60
102
```python
61
-
generated_text = model.generate("Once upon a time, ", n_predict=55)
62
-
print(generated_text)
103
+
from pyllamacpp.model import Model
104
+
105
+
model = Model(ggml_model='./models/gpt4all-model.bin')
106
+
for token in model.generate("Tell me a joke ?"):
107
+
print(token, end='')
63
108
```
64
109
65
-
## Interactive Mode
110
+
### Interactive Dialogue
111
+
You can set up an interactive dialogue by simply keeping the `model` variable alive:
66
112
67
-
If you want to run the program in interactive mode you can add the `grab_text_callback` function and set `interactive` to True in the generate function. `grab_text_callback` should always return a string unless you wish to signal EOF in which case you should return None.
113
+
```python
114
+
from pyllamacpp.model import Model
115
+
116
+
model = Model(ggml_model='./models/gpt4all-model.bin')
117
+
whileTrue:
118
+
try:
119
+
prompt =input("You: ", flush=True)
120
+
if prompt =='':
121
+
continue
122
+
print(f"AI:", end='')
123
+
for tok in model.generate(prompt):
124
+
print(f"{tok}", end='', flush=True)
125
+
print()
126
+
exceptKeyboardInterrupt:
127
+
break
128
+
```
129
+
### Different persona
130
+
You can customize the `prompt_context` to _"give the language model a different persona"_ as follows:
68
131
69
-
```py
132
+
```python
70
133
from pyllamacpp.model import Model
71
134
72
-
defnew_text_callback(text: str):
73
-
print(text, end="", flush=True)
135
+
prompt_context =""" Act as Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision. To do this, Bob uses a database of information collected from many different sources, including books, journals, online articles, and more.
74
136
75
-
defgrab_text_callback():
76
-
inpt =input()
77
-
# To signal EOF, return None
78
-
if inpt =="END":
79
-
returnNone
80
-
return inpt
137
+
User: Nice to meet you Bob!
138
+
Bob: Welcome! I'm here to assist you with anything you need. What can I do for you today?
139
+
"""
81
140
82
-
model = Model(ggml_model='./models/gpt4all-model.bin', n_ctx=512)
141
+
prompt_prefix ="\n User:"
142
+
prompt_suffix ="\n Bob:"
83
143
84
-
# prompt from https://github.com/ggerganov/llama.cpp/blob/master/prompts/chat-with-bob.txt
85
-
prompt ="""
86
-
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision. To do this, Bob uses a database of information collected from many different sources, including books, journals, online articles, and more.
144
+
model = Model(ggml_model=model, n_ctx=512, prompt_context=prompt_context, prompt_prefix=prompt_prefix,
145
+
prompt_suffix=prompt_suffix)
87
146
88
-
User: Hello, Bob.
89
-
Bob: Hello. How may I help you today?
90
-
User: Please tell me the largest city in Europe.
91
-
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
* You can pass any `llama context`[parameter](https://nomic-ai.github.io/pyllamacpp/#pyllamacpp.constants.LLAMA_CONTEXT_PARAMS_SCHEMA) as a keyword argument to the `Model` class
98
-
* You can pass any `gpt`[parameter](https://nomic-ai.github.io/pyllamacpp/#pyllamacpp.constants.GPT_PARAMS_SCHEMA) as a keyword argument to the `generarte` method
99
-
* You can always refer to the [short documentation](https://nomic-ai.github.io/pyllamacpp/) for more details.
100
161
162
+
You can always refer to the [short documentation](https://abdeladim-s.github.io/pyllamacpp/) for more details.
101
163
102
-
# Supported model
103
164
104
-
### GPT4All
165
+
#Supported models
105
166
106
-
Download a GPT4All model from https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/.
107
-
The easiest approach is download a file whose name ends in `ggml.bin`--older model versions require conversion.
167
+
Fully tested with [GPT4All](https://github.com/nomic-ai/gpt4all) model, see [PyGPT4All](https://github.com/nomic-ai/pygpt4all).
108
168
109
-
If you have an older model downloaded that you want to convert, in your terminal run:
If you find any bug, please open an [issue](https://github.com/nomic-ai/pyllamacpp/issues).
185
+
If you find any bug, please open an [issue](https://github.com/abdeladim-s/pyllamacpp/issues).
119
186
120
-
If you have any feedback, or you want to share how you are using this project, feel free to use the [Discussions](https://github.com/nomic-ai/pyllamacpp/discussions) and open a new topic.
187
+
If you have any feedback, or you want to share how you are using this project, feel free to use the [Discussions](https://github.com/abdeladim-s/pyllamacpp/discussions) and open a new topic.
0 commit comments