Summary
The AI Chat Web template (dotnet new aichatweb, Microsoft.Extensions.AI.Templates) lets
you pick a chat and embedding provider: Azure OpenAI, GitHub Models, OpenAI, and Ollama. The
only on-device option today is Ollama, a third-party runtime. This proposes adding
Foundry Local, Microsoft's own on-device inference runtime, as
a first-class provider for both the chat IChatClient and the embedding IEmbeddingGenerator.
Why
- It gives the template a Microsoft-owned local inference story: no cloud, no API key, models
run on the developer's machine on CPU, GPU, or NPU through ONNX Runtime.
- Foundry Local is OpenAI-compatible, so it drops into the same
OpenAIClient -> AsIChatClient() / AsIEmbeddingGenerator() shape the template already uses
for the OpenAI and GitHub Models providers. The change is small and well-contained.
- The template's provider list is the de facto answer to "what does .NET AI support locally."
Adding Foundry Local closes the gap and gives the docs and RAG tutorials a reference
integration to point at.
What it would look like
A new provider choice, for example --provider foundrylocal, that wires Foundry Local for
both chat and embeddings. Suggested defaults: qwen3-4b for chat (Apache-2.0, tool-calling
capable) and qwen3-embedding-0.6b for embeddings (1024-dimension vectors).
Proof of concept
A runnable end-to-end sample is here:
https://github.com/luisquintanilla/foundry-local-aichatweb
It is the standard AI Chat Web app (Blazor, RAG over a local PDF, SQLite vector store) with the
provider swapped to Foundry Local. It builds and runs end to end, and includes a dev container
so you can try it in Codespaces.
An aspire branch shows what an Aspire orchestration could look like, using a small custom
AddFoundryLocal hosting integration (there is no Aspire hosting integration for Foundry Local
today): https://github.com/luisquintanilla/foundry-local-aichatweb/tree/aspire
Notes for implementation
- SDK:
Microsoft.AI.Foundry.Local 1.2.3 (stable, on nuget.org). The app uses the manager to
start the local OpenAI-compatible web service and load models, then points the standard
OpenAI client at it. Foundry Local is keyless on localhost.
- The embedding model returns 1024-dimension vectors, so
IngestedChunk.VectorDimensions
(currently hardcoded to 1536) needs to become per-provider.
- First run downloads the model weights; later runs start fast.
Scope
Non-Aspire first. The Aspire path is under investigation: there is no Aspire hosting
integration for Foundry Local today (unlike Ollama's AddOllama), so a custom
AddFoundryLocal integration is prototyped on the sample's aspire branch (linked above). We
can decide how to bring Aspire into the template once that settles.
Summary
The AI Chat Web template (
dotnet new aichatweb,Microsoft.Extensions.AI.Templates) letsyou pick a chat and embedding provider: Azure OpenAI, GitHub Models, OpenAI, and Ollama. The
only on-device option today is Ollama, a third-party runtime. This proposes adding
Foundry Local, Microsoft's own on-device inference runtime, as
a first-class provider for both the chat
IChatClientand the embeddingIEmbeddingGenerator.Why
run on the developer's machine on CPU, GPU, or NPU through ONNX Runtime.
OpenAIClient -> AsIChatClient() / AsIEmbeddingGenerator()shape the template already usesfor the OpenAI and GitHub Models providers. The change is small and well-contained.
Adding Foundry Local closes the gap and gives the docs and RAG tutorials a reference
integration to point at.
What it would look like
A new provider choice, for example
--provider foundrylocal, that wires Foundry Local forboth chat and embeddings. Suggested defaults:
qwen3-4bfor chat (Apache-2.0, tool-callingcapable) and
qwen3-embedding-0.6bfor embeddings (1024-dimension vectors).Proof of concept
A runnable end-to-end sample is here:
https://github.com/luisquintanilla/foundry-local-aichatweb
It is the standard AI Chat Web app (Blazor, RAG over a local PDF, SQLite vector store) with the
provider swapped to Foundry Local. It builds and runs end to end, and includes a dev container
so you can try it in Codespaces.
An
aspirebranch shows what an Aspire orchestration could look like, using a small customAddFoundryLocalhosting integration (there is no Aspire hosting integration for Foundry Localtoday): https://github.com/luisquintanilla/foundry-local-aichatweb/tree/aspire
Notes for implementation
Microsoft.AI.Foundry.Local1.2.3 (stable, on nuget.org). The app uses the manager tostart the local OpenAI-compatible web service and load models, then points the standard
OpenAI client at it. Foundry Local is keyless on localhost.
IngestedChunk.VectorDimensions(currently hardcoded to 1536) needs to become per-provider.
Scope
Non-Aspire first. The Aspire path is under investigation: there is no Aspire hosting
integration for Foundry Local today (unlike Ollama's
AddOllama), so a customAddFoundryLocalintegration is prototyped on the sample'saspirebranch (linked above). Wecan decide how to bring Aspire into the template once that settles.