You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/server/README.md
+64Lines changed: 64 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -40,6 +40,7 @@ see https://github.com/ggerganov/llama.cpp/issues/1437
40
40
-`--grp-attn-n`: Set the group attention factor to extend context size through self-extend(default: 1=disabled), used together with group attention width `--grp-attn-w`
41
41
-`--grp-attn-w`: Set the group attention width to extend context size through self-extend(default: 512), used together with group attention factor `--grp-attn-n`
42
42
-`-n, --n-predict`: Set the maximum tokens to predict (default: -1)
43
+
-`--slots-endpoint-disable`: To disable slots state monitoring endpoint. Slots state may contain user data, prompts included.
43
44
44
45
## Build
45
46
@@ -381,6 +382,69 @@ Notice that each `probs` is an array of length `n_probs`.
381
382
}'
382
383
```
383
384
385
+
-**GET**`/slots`: Returns the current slots processing state. Can be disabled with`--slots-endpoint-disable`.
0 commit comments