-
Notifications
You must be signed in to change notification settings - Fork 308
ChatQnA - Adding files to deploy an application in the K8S environment using Helm #1759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
cf60682
1fd1de1
bd2d47e
2459ecb
4d35065
6d5049d
9dfbdc5
a8857ae
5a38b26
0e2ef94
30071db
0757dec
9aaf378
9cf4b6e
8e89787
a117c69
7fed5cf
28504e1
4cd6a50
9ccf540
b5df348
60dd862
06d31cc
59ffc84
4bd9c1a
2abf738
8e8d296
9aba6d0
24f886f
2e1b401
c9a7807
b2e1523
947aa81
6b2b297
f7b3be6
23fbd2f
4b47c3e
ed01594
d1861f9
a30a6e3
859b697
3460a38
fc75a8c
2b701ca
12845f1
aef57f6
bd0996c
fe3132e
a9154e8
10fb928
f746c78
a7c83e3
db31e55
3b33f30
e6d0c27
caefabe
a74dc1e
f6b63d1
e836b36
48a6a0a
48ee4c4
7b78247
6fbe02d
06dab21
a6d6f1f
cb831b0
ffa0ead
b725c26
faf8f09
6e262af
ceffcff
7a4e2a7
0e6eacb
a658f80
d12765b
a355478
43df124
62b72dd
67ab246
422698d
0516abd
d8191af
c1a8cdc
bbfe10a
ef0a480
40ddb37
d067da7
6938c7e
aadc101
3673479
92848dc
ca1cd8e
ebcaf1a
ec814b1
ac069c9
da0f28e
298de03
7b0840f
2c90366
a03f538
e34a215
e4f7f1b
a649e33
852f9d4
2740b7b
40bae97
c126d41
a3cfbc8
622fffa
66be3e6
b21bf43
4eb572a
955caa7
98aa001
a109d6e
04f2cac
a319903
5d06aea
1746b2f
733bb81
4a539cf
5f8774a
9784ba6
eca2a54
796cea2
43fbc46
840920b
ce00546
ee2aad2
9129af5
50ed6f3
dd01990
e7b3847
c9e3baa
1ab4b56
29c31ec
449edab
42cd24a
17a0499
239595a
fbf78b9
4377f19
54a6525
9836332
f2d95a9
69cdec4
f251ab2
a190a84
222a343
162793f
592edb4
56ab290
cc23fb9
48085d2
dcb85f2
7c54f40
d907684
fb19fcd
7f7f475
b6ca1c4
0c70d96
7b0f89a
4a6a675
b4cd1d9
7387a48
6514ded
b8fb7d4
311f2c3
3ef1823
56e9840
97deca0
83a12a9
b5f1146
3c42c19
1589db0
b5a77d6
4776312
fe80211
6333338
174d528
79cce28
1ce338f
7920799
b18bce0
8142872
531d21b
940506b
d7cc6da
a48f11d
082f5d5
257b633
6dfffa3
07e1b71
86dfda5
30bb758
988db95
20422f0
18a3e69
4449660
ab05955
b719089
c95fda8
50d110a
f0c7ace
b85a06d
cb03db2
b068668
6b19e10
f54021e
dcbc56a
1c5eedb
578c2e5
d4aafc6
1e81f17
441f617
e518929
e03ac0b
11c9bd1
7fedeb8
3b99f89
a5e1f6c
233ef4d
54b003d
74ede45
daf423d
e85c2ae
df0c956
49a7eae
fb14f29
e48dee7
814caa4
32c9fec
2d399ac
ee32b05
73b0162
8139f55
d7af65a
2ecf319
618ab09
8db1dd8
ad8c766
1674112
5358393
0da422a
60b830a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,3 +28,224 @@ helm install chatqna oci://ghcr.io/opea-project/charts/chatqna --set global.HUG | |
``` | ||
|
||
See other *-values.yaml files in this directory for more reference. | ||
|
||
## Deploy on AMD ROCm using Helm charts from the binary Helm repository | ||
|
||
```bash | ||
mkdir ~/chatqna-k8s-install && cd ~/chatqna-k8s-install | ||
``` | ||
|
||
### Cloning repos | ||
|
||
```bash | ||
git clone https://github.com/opea-project/GenAIExamples.git | ||
``` | ||
|
||
### Go to the installation directory | ||
|
||
```bash | ||
cd GenAIExamples/ChatQnA/kubernetes/helm | ||
``` | ||
|
||
### Settings system variables | ||
|
||
```bash | ||
export HFTOKEN="your_huggingface_token" | ||
export MODELDIR="/mnt/opea-models" | ||
export MODELNAME="meta-llama/Meta-Llama-3-8B-Instruct" | ||
``` | ||
|
||
### Setting variables in Values files | ||
|
||
#### If ROCm vLLM used | ||
```bash | ||
nano ~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm/rocm-values.yaml | ||
``` | ||
|
||
#### If deploy FaqGen based application on AMD ROCm device with vLLM | ||
```bash | ||
nano ~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm/faqgen-rocm-values.yaml | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto, could assume you are in the correct directory |
||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used | ||
- ```yaml | ||
resources: | ||
limits: | ||
amd.com/gpu: "1" # replace "1" with the number of GPUs used | ||
``` | ||
|
||
#### If ROCm TGI used | ||
|
||
```bash | ||
nano ~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm/rocm-tgi-values.yaml | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto on directory |
||
``` | ||
|
||
#### If deploy FaqGen based application on AMD ROCm device with TGI | ||
|
||
```bash | ||
nano ~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm/faqgen-rocm-tgi-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used | ||
- ```yaml | ||
resources: | ||
limits: | ||
amd.com/gpu: "1" # replace "1" with the number of GPUs used | ||
``` | ||
|
||
### Installing the Helm Chart | ||
|
||
#### If ROCm vLLM used | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this heading not bold faced while others below are? |
||
```bash | ||
helm upgrade --install chatqna oci://ghcr.io/opea-project/charts/chatqna \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values rocm-values.yaml | ||
``` | ||
|
||
#### If ROCm TGI used | ||
```bash | ||
helm upgrade --install chatqna oci://ghcr.io/opea-project/charts/chatqna \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values rocm-tgi-values.yaml | ||
``` | ||
|
||
#### If deploy FaqGen based application on AMD ROCm device with vLLM | ||
```bash | ||
helm upgrade --install chatqna oci://ghcr.io/opea-project/charts/chatqna \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values faqgen-rocm-values.yaml | ||
``` | ||
|
||
#### If deploy FaqGen based application on AMD ROCm device with TGI | ||
```bash | ||
helm upgrade --install chatqna oci://ghcr.io/opea-project/charts/chatqna \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values faqgen-rocm-tgi-values.yaml | ||
``` | ||
|
||
## Deploy on AMD ROCm using Helm charts from Git repositories | ||
|
||
### Creating working dirs | ||
|
||
```bash | ||
mkdir ~/chatqna-k8s-install && cd ~/chatqna-k8s-install | ||
``` | ||
|
||
### Cloning repos | ||
|
||
```bash | ||
git clone git clone https://github.com/opea-project/GenAIExamples.git | ||
git clone git clone https://github.com/opea-project/GenAIInfra.git | ||
``` | ||
|
||
### Go to the installation directory | ||
|
||
```bash | ||
cd GenAIExamples/ChatQnA/kubernetes/helm | ||
``` | ||
|
||
### Settings system variables | ||
|
||
```bash | ||
export HFTOKEN="your_huggingface_token" | ||
export MODELDIR="/mnt/opea-models" | ||
export MODELNAME="Intel/neural-chat-7b-v3-3" | ||
``` | ||
|
||
### Setting variables in Values files | ||
|
||
#### If ROCm vLLM used | ||
```bash | ||
nano ~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm/rocm-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used | ||
- resources: | ||
limits: | ||
amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
||
#### If ROCm TGI used | ||
|
||
```bash | ||
nano ~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm/rocm-tgi-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used | ||
- resources: | ||
limits: | ||
amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
||
#### If deploy FaqGen based application on AMD ROCm device with vLLM | ||
```bash | ||
nano ~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm/faqgen-rocm-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- TENSOR_PARALLEL_SIZE - must match the number of GPUs used | ||
- resources: | ||
limits: | ||
amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
||
#### If deploy FaqGen based application on AMD ROCm device with TGI | ||
|
||
```bash | ||
nano ~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm/faqgen-rocm-tgi-values.yaml | ||
``` | ||
|
||
- HIP_VISIBLE_DEVICES - this variable specifies the ID of the GPU that you want to use. | ||
You can specify either one or several comma-separated ones - "0" or "0,1,2,3" | ||
- extraCmdArgs: [ "--num-shard","1" ] - replace "1" with the number of GPUs used | ||
- resources: | ||
limits: | ||
amd.com/gpu: "1" - replace "1" with the number of GPUs used | ||
|
||
### Installing the Helm Chart | ||
|
||
#### If ROCm vLLM used | ||
```bash | ||
cd ~/chatqna-k8s-install/GenAIInfra/helm-charts | ||
./update_dependency.sh | ||
helm dependency update chatqna | ||
helm upgrade --install chatqna chatqna \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values ../../GenAIExamples/ChatQnA/kubernetes/helm/rocm-values.yaml | ||
``` | ||
|
||
#### If ROCm TGI used | ||
```bash | ||
cd ~/chatqna-k8s-install/GenAIInfra/helm-charts | ||
./update_dependency.sh | ||
helm dependency update chatqna | ||
helm upgrade --install chatqna chatqna \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values ../../GenAIExamples/ChatQnA/kubernetes/helm/rocm-tgi-values.yaml | ||
``` | ||
|
||
#### If deploy FaqGen based application on AMD ROCm device with vLLM | ||
```bash | ||
cd ~/chatqna-k8s-install/GenAIInfra/helm-charts | ||
./update_dependency.sh | ||
helm dependency update chatqna | ||
helm upgrade --install chatqna chatqna \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values ../../GenAIExamples/ChatQnA/kubernetes/helm/faqgen-rocm-values.yaml | ||
``` | ||
|
||
#### If deploy FaqGen based application on AMD ROCm device with TGI | ||
```bash | ||
cd ~/chatqna-k8s-install/GenAIInfra/helm-charts | ||
./update_dependency.sh | ||
helm dependency update chatqna | ||
helm upgrade --install chatqna chatqna \ | ||
--set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} \ | ||
--values ../../GenAIExamples/ChatQnA/kubernetes/helm/faqgen-rocm-tgi-values.yaml | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
CHATQNA_TYPE: "CHATQNA_FAQGEN" | ||
llm-uservice: | ||
enabled: true | ||
image: | ||
repository: opea/llm-faqgen | ||
LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct | ||
FAQGEN_BACKEND: "TGI" | ||
service: | ||
port: 80 | ||
tgi: | ||
enabled: true | ||
accelDevice: "rocm" | ||
image: | ||
repository: ghcr.io/huggingface/text-generation-inference | ||
tag: "2.4.1-rocm" | ||
LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct | ||
MAX_INPUT_LENGTH: "3072" | ||
MAX_TOTAL_TOKENS: "4096" | ||
PYTORCH_TUNABLEOP_ENABLED: "0" | ||
USE_FLASH_ATTENTION: "true" | ||
FLASH_ATTENTION_RECOMPUTE: "false" | ||
HIP_VISIBLE_DEVICES: "0,1" | ||
MAX_BATCH_SIZE: "2" | ||
extraCmdArgs: [ "--num-shard","2" ] | ||
resources: | ||
limits: | ||
amd.com/gpu: "2" | ||
requests: | ||
cpu: 1 | ||
memory: 16Gi | ||
securityContext: | ||
readOnlyRootFilesystem: false | ||
runAsNonRoot: false | ||
runAsUser: 0 | ||
capabilities: | ||
add: | ||
- SYS_PTRACE | ||
readinessProbe: | ||
initialDelaySeconds: 60 | ||
periodSeconds: 5 | ||
timeoutSeconds: 1 | ||
failureThreshold: 120 | ||
startupProbe: | ||
initialDelaySeconds: 60 | ||
periodSeconds: 5 | ||
timeoutSeconds: 1 | ||
failureThreshold: 120 | ||
vllm: | ||
enabled: false | ||
|
||
# Reranking: second largest bottleneck when reranking is in use | ||
# (i.e. query context docs have been uploaded with data-prep) | ||
# | ||
# TODO: could vLLM be used also for reranking / embedding? | ||
teirerank: | ||
accelDevice: "cpu" | ||
image: | ||
repository: ghcr.io/huggingface/text-embeddings-inference | ||
tag: cpu-1.5 | ||
# securityContext: | ||
# readOnlyRootFilesystem: false | ||
readinessProbe: | ||
timeoutSeconds: 1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
CHATQNA_TYPE: "CHATQNA_FAQGEN" | ||
llm-uservice: | ||
enabled: true | ||
image: | ||
repository: opea/llm-faqgen | ||
LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct | ||
FAQGEN_BACKEND: "vLLM" | ||
service: | ||
port: 80 | ||
tgi: | ||
enabled: false | ||
vllm: | ||
enabled: true | ||
accelDevice: "rocm" | ||
image: | ||
repository: opea/vllm-rocm | ||
tag: latest | ||
env: | ||
HIP_VISIBLE_DEVICES: "0" | ||
TENSOR_PARALLEL_SIZE: "1" | ||
HF_HUB_DISABLE_PROGRESS_BARS: "1" | ||
HF_HUB_ENABLE_HF_TRANSFER: "0" | ||
VLLM_USE_TRITON_FLASH_ATTN: "0" | ||
VLLM_WORKER_MULTIPROC_METHOD: "spawn" | ||
PYTORCH_JIT: "0" | ||
HF_HOME: "/data" | ||
extraCmd: | ||
command: [ "python3", "/workspace/api_server.py" ] | ||
extraCmdArgs: [ "--swap-space", "16", | ||
"--disable-log-requests", | ||
"--dtype", "float16", | ||
"--num-scheduler-steps", "1", | ||
"--distributed-executor-backend", "mp" ] | ||
resources: | ||
limits: | ||
amd.com/gpu: "1" | ||
startupProbe: | ||
failureThreshold: 180 | ||
securityContext: | ||
readOnlyRootFilesystem: false | ||
runAsNonRoot: false | ||
runAsUser: 0 | ||
|
||
# Reranking: second largest bottleneck when reranking is in use | ||
# (i.e. query context docs have been uploaded with data-prep) | ||
# | ||
# TODO: could vLLM be used also for reranking / embedding? | ||
teirerank: | ||
accelDevice: "cpu" | ||
image: | ||
repository: ghcr.io/huggingface/text-embeddings-inference | ||
tag: cpu-1.5 | ||
# securityContext: | ||
# readOnlyRootFilesystem: false | ||
readinessProbe: | ||
timeoutSeconds: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could assume you are already in the correct directory, namely
~/chatqna-k8s-install/GenAIExamples/ChatQnA/kubernetes/helm