Skip to content

change: routing logic change for Hugging Face DLCs#5960

Open
dwarez wants to merge 3 commits into
aws:masterfrom
huggingface:hf_routing_refactor
Open

change: routing logic change for Hugging Face DLCs#5960
dwarez wants to merge 3 commits into
aws:masterfrom
huggingface:hf_routing_refactor

Conversation

@dwarez

@dwarez dwarez commented Jun 18, 2026

Copy link
Copy Markdown

Depends on #5957.

ModelBuilder's auto-detection will now select:

  • huggingface-vllm for text-generation
  • huggingface-vllm-omni for multimodal tasks
  • huggingface-sglang is opt-in on user side
  • tei logic unchanged

dwarez added 2 commits June 18, 2026 07:48
huggingface-pytorch-inference images

add: huggingface-vllm, huggingface-sglang, huggingface-vllm-omni
families metadata

Signed-off-by: DWarez <dario.salvati@huggingface.co>
…elBuilder

Add ModelServer.VLLM/SGLANG/VLLM_OMNI and teach ModelBuilder's auto-detection
to select the new HuggingFace DLCs: text-generation now defaults to vLLM
(replacing archived TGI), multimodal tasks route to vLLM-omni, and SGLang is
reachable via explicit model_server. TEI/transformers routing is unchanged.

Signed-off-by: DWarez <dario.salvati@huggingface.co>

@alvarobartt alvarobartt left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, only nit is in the model_task check for Text Embeddings Inference which is missing text-ranking task, see https://huggingface.co/models?pipeline_tag=text-ranking&other=text-embeddings-inference&sort=trending

Comment thread sagemaker-serve/src/sagemaker/serve/model_builder.py Outdated
Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants