-
Notifications
You must be signed in to change notification settings - Fork 163
Remote model inference streaming #3898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi team, I am publishing this pr for your early review. Meanwhile I am adding feature flag, fixing the existing tests and adding new tests. |
llmInterface = llmInterface.trim().toLowerCase(Locale.ROOT); | ||
llmInterface = StringEscapeUtils.unescapeJava(llmInterface); | ||
switch (llmInterface) { | ||
case "openai/v1/chat/completions": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you mentioned Bedrock converse support too in the description. Are we only going with Open AI for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The neededStreamParameterInPayload
is to check if we need to add "stream": true
in http request body as openai does. But Bedrock converseStream
doesn't use that parameter, but use a specific url like POST /model/modelId/converse-stream
, so no converse here.
plugin/src/main/java/org/opensearch/ml/plugin/MachineLearningPlugin.java
Outdated
Show resolved
Hide resolved
settings, | ||
STREAM_PREDICT_THREAD_POOL, | ||
OpenSearchExecutors.allocatedProcessors(settings) * 4, | ||
10000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand, how did we come up with these numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used remote configuration as a reference when configuring stream pool.
ml-algorithms/src/main/java/org/opensearch/ml/engine/arrow/RemoteModelStreamProducer.java
Show resolved
Hide resolved
root.setRowCount(1); | ||
} else { | ||
try { | ||
Thread.sleep(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid thread.sleep by using a blocking queue?
d44ef25
to
c31819c
Compare
c31819c
to
b2d5fb7
Compare
Signed-off-by: Jing Zhang <[email protected]>
b2d5fb7
to
bfeb504
Compare
add feature flag Signed-off-by: Jing Zhang <[email protected]>
bfeb504
to
29fab5a
Compare
Description
Support remote model streaming predict API. For now support openai v1 completion and bedrock converse stream APIs.
example
Related Issues
#3630
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.