Skip to content

Commit 0356b81

Browse files
committed
Add a model_server example podman-llm
This is a tool that was written to be as simple as ollama, in it's simplest form it's: podman-llm run granite Signed-off-by: Eric Curtin <[email protected]>
1 parent 5875d90 commit 0356b81

File tree

2 files changed

+160
-38
lines changed

2 files changed

+160
-38
lines changed

README.md

Lines changed: 71 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,56 +1,89 @@
1-
# AI Lab Recipes
1+
# podman-llm
22

3-
This repo contains recipes for building and running containerized AI and LLM
4-
Applications with Podman.
3+
The goal of podman-llm is to make AI even more boring.
54

6-
These containerized AI recipes can be used to help developers quickly prototype
7-
new AI and LLM based applications locally, without the need for relying on any other
8-
externally hosted services. Since they are already containerized, it also helps
9-
developers move quickly from prototype to production.
5+
## Install
106

11-
## Model servers
7+
Install podman-llm by running this one-liner:
128

13-
#### What's a model server?
9+
```
10+
curl -fsSL https://raw.githubusercontent.com/ericcurtin/podman-llm/main/install.sh | sudo bash
11+
```
1412

15-
A model server is a program that serves machine-learning models, such as LLMs, and
16-
makes their functions available via an API. This makes it easy for developers to
17-
incorporate AI into their applications. This repository provides descriptions and
18-
code for building several of these model servers.
13+
## Usage
1914

20-
Many of the sample applications rely on the `llamacpp_python` model server by
21-
default. This server can be used for various generative AI applications with various models.
22-
However, each sample application can be paired with a variety of model servers.
15+
### Running Models
2316

24-
Learn how to build and run the llamacpp_python model server by following the
25-
[llamacpp_python model server README](/model_servers/llamacpp_python/README.md).
17+
You can run a model using the `run` command. This will start an interactive session where you can query the model.
2618

27-
## Current Recipes
19+
```
20+
$ podman-llm run granite
21+
> Tell me about podman in less than ten words
22+
A fast, secure, and private container engine for modern applications.
23+
>
24+
```
2825

29-
Recipes consist of at least two components: A model server and an AI application.
30-
The model server manages the model, and the AI application provides the specific
31-
logic needed to perform some specific task such as chat, summarization, object
32-
detection, etc.
26+
### Serving Models
3327

34-
There are several sample applications in this repository that can be found in the
35-
[recipes](./recipes) directory.
28+
To serve a model via HTTP, use the `serve` command. This will start an HTTP server that listens for incoming requests to interact with the model.
3629

37-
They fall under the categories:
30+
```
31+
$ podman-llm serve granite
32+
...
33+
{"tid":"140477699799168","timestamp":1719579518,"level":"INFO","function":"main","line":3793,"msg":"HTTP server listening","n_threads_http":"11","port":"8080","hostname":"127.0.0.1"}
34+
...
35+
```
3836

39-
* [audio](./recipes/audio)
40-
* [computer-vision](./recipes/computer_vision)
41-
* [multimodal](./recipes/multimodal)
42-
* [natural language processing](./recipes/natural_language_processing)
37+
## Model library
4338

39+
| Model | Parameters | Run |
40+
| ------------------ | ---------- | ------------------------------ |
41+
| granite | 3B | `podman-llm run granite` |
42+
| mistral | 7B | `podman-llm run mistral` |
43+
| merlinite | 7B | `podman-llm run merlinite` |
4444

45-
Learn how to build and run each application by visiting their README's.
46-
For example, learn how to run the [chatbot recipe here](./recipes/natural_language_processing/chatbot).
45+
## Containerfile Example
4746

48-
## Current AI Lab Recipe images built from this repository
47+
Here is an example Containerfile:
4948

50-
Images for many sample applications and models are available in `quay.io`. All
51-
currently built images are tracked in
52-
[ailab-images.md](./ailab-images.md)
49+
```
50+
FROM quay.io/podman-llm/podman-llm:41
51+
LABEL model=/granite-3b-code-instruct.Q4_K_M.gguf
52+
RUN llama-main --hf-repo ibm-granite/granite-3b-code-instruct-GGUF -m granite-3b-code-instruct.Q4_K_M.gguf
53+
```
5354

54-
## [Training](./training/README.md)
55+
`LABEL model` is important so we know where to find the .gguf file.
56+
57+
And we build via:
58+
59+
```
60+
podman build -t granite podman-llm/granite:3b
61+
```
62+
63+
## Diagram
64+
65+
```
66+
+------------------------+ +--------------------+ +------------------+
67+
| | | Pull runtime layer | | Pull model layer |
68+
| podman-llm run | -> | with llama.cpp | -> | with granite |
69+
| | | | | |
70+
+------------------------+ +--------------------+ |------------------|
71+
| Repo options: |
72+
+------------------+
73+
| |
74+
v v
75+
+--------------+ +---------+
76+
| Hugging Face | | quay.io |
77+
+--------------+ +---------+
78+
\ /
79+
\ /
80+
\ /
81+
v v
82+
+-----------------+
83+
| Start container |
84+
| with llama.cpp |
85+
| and granite |
86+
| model |
87+
+-----------------+
88+
```
5589

56-
Linux Operating System Bootable containers enabled for AI Training

model_servers/podman-llm/README.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# podman-llm
2+
3+
The goal of podman-llm is to make AI even more boring.
4+
5+
## Install
6+
7+
Install podman-llm by running this one-liner:
8+
9+
```
10+
curl -fsSL https://raw.githubusercontent.com/ericcurtin/podman-llm/main/install.sh | sudo bash
11+
```
12+
13+
## Usage
14+
15+
### Running Models
16+
17+
You can run a model using the `run` command. This will start an interactive session where you can query the model.
18+
19+
```
20+
$ podman-llm run granite
21+
> Tell me about podman in less than ten words
22+
A fast, secure, and private container engine for modern applications.
23+
>
24+
```
25+
26+
### Serving Models
27+
28+
To serve a model via HTTP, use the `serve` command. This will start an HTTP server that listens for incoming requests to interact with the model.
29+
30+
```
31+
$ podman-llm serve granite
32+
...
33+
{"tid":"140477699799168","timestamp":1719579518,"level":"INFO","function":"main","line":3793,"msg":"HTTP server listening","n_threads_http":"11","port":"8080","hostname":"127.0.0.1"}
34+
...
35+
```
36+
37+
## Model library
38+
39+
| Model | Parameters | Run |
40+
| ------------------ | ---------- | ------------------------------ |
41+
| granite | 3B | `podman-llm run granite` |
42+
| mistral | 7B | `podman-llm run mistral` |
43+
| merlinite | 7B | `podman-llm run merlinite` |
44+
45+
## Containerfile Example
46+
47+
Here is an example Containerfile:
48+
49+
```
50+
FROM quay.io/podman-llm/podman-llm:41
51+
LABEL model=/granite-3b-code-instruct.Q4_K_M.gguf
52+
RUN llama-main --hf-repo ibm-granite/granite-3b-code-instruct-GGUF -m granite-3b-code-instruct.Q4_K_M.gguf
53+
```
54+
55+
`LABEL model` is important so we know where to find the .gguf file.
56+
57+
And we build via:
58+
59+
```
60+
podman build -t granite podman-llm/granite:3b
61+
```
62+
63+
## Diagram
64+
65+
```
66+
+------------------------+ +--------------------+ +------------------+
67+
| | | Pull runtime layer | | Pull model layer |
68+
| podman-llm run | -> | with llama.cpp | -> | with granite |
69+
| | | | | |
70+
+------------------------+ +--------------------+ |------------------|
71+
| Repo options: |
72+
+------------------+
73+
| |
74+
v v
75+
+--------------+ +---------+
76+
| Hugging Face | | quay.io |
77+
+--------------+ +---------+
78+
\ /
79+
\ /
80+
\ /
81+
v v
82+
+-----------------+
83+
| Start container |
84+
| with llama.cpp |
85+
| and granite |
86+
| model |
87+
+-----------------+
88+
```
89+

0 commit comments

Comments
 (0)