a Ray Serve Chat Demo Serving Hugging Face Models
-
Open Up io.net account
-
Follow through standard procedure on launching a Ray Cluster. Select a small cluster, for example 4 T4.
-
When the cluster is ready, select Visual Studio Code (VSCode)

-
Launch Visual studio code terminal and clone this repo
git clone https://github.com/ionet-official/io-ray-serve-chat-demo.git- Go to the folder
cd io-ray-serve-chat-demo- Start the chat server via
serve run chat.yaml- Wait till the Ray serve deploys the chat app across workers. You will see on the terminal a "Model loaded" message.
- Test your Chatbot from the cluster. Open a new terminal and run the sample chat client
python chat_client.py- Test your Chatbot server endpoint from outside the Cluster
- Server endpoint:
https://exposed_service-[YOUR-CLUSTER-SUFFIX].headnodes.io.systems/ - If your cluster suffix is
1d47a, then:https://exposed_service-1d47a.headnodes.io.systems/ - One way to identify your prefix is from the the VSCode URL, which looks like
https://vscode-1d47a.headnodes.io.systems/ - You can use below code snippet to interact with the Ray serve application created (update the endpoint to your server)
- Server endpoint:
import requests
SERVER_ENDPOINT = "https://exposed_service-1d47a.headnodes.io.systems/"
message = "What is the capital of France?"
history = []
response = requests.post(SERVER_ENDPOINT, json={"user_input": message, "history": history})
print(response.json())or on a terminal:
curl -X POST https://exposed_service-1d47a.headnodes.io.systems/ \
-H "Content-Type: application/json" \
-d '{"user_input": "What is the capital of France?", "history": []}'

