Add a model_server example podman-llm

ericcurtin · ericcurtin · commit 0356b81b0d07 · 2024-06-29T01:05:45.000+01:00
This is a tool that was written to be as simple as ollama, in it's
simplest form it's:

podman-llm run granite

Signed-off-by: Eric Curtin &lt;ecurtin@redhat.com&gt;
diff --git a/README.md b/README.md
@@ -1,56 +1,89 @@
-# AI Lab Recipes
+# podman-llm
 
-This repo contains recipes for building and running containerized AI and LLM
-Applications with Podman.
+The goal of podman-llm is to make AI even more boring.
 
-These containerized AI recipes can be used to help developers quickly prototype
-new AI and LLM based applications locally, without the need for relying on any other
-externally hosted services. Since they are already containerized, it also helps
-developers move quickly from prototype to production.
+## Install
 
-## Model servers
+Install podman-llm by running this one-liner:
 
-#### What's a model server?
+```
+curl -fsSL https://raw.githubusercontent.com/ericcurtin/podman-llm/main/install.sh | sudo bash
+```
 
-A model server is a program that serves machine-learning models, such as LLMs, and
-makes their functions available via an API. This makes it easy for developers to 
-incorporate AI into their applications. This repository provides descriptions and 
-code for building several of these model servers.
+## Usage
 
-Many of the sample applications rely on the `llamacpp_python` model server by
-default. This server can be used for various generative AI applications with various models.
-However, each sample application can be paired with a variety of model servers.
+### Running Models
 
-Learn how to build and run the llamacpp_python model server by following the
-[llamacpp_python model server README](/model_servers/llamacpp_python/README.md).
+You can run a model using the `run` command. This will start an interactive session where you can query the model.
 
-## Current Recipes 
+```
+$ podman-llm run granite
+> Tell me about podman in less than ten words
+A fast, secure, and private container engine for modern applications.
+>
+```
 
-Recipes consist of at least two components: A model server and an AI application.
-The model server manages the model, and the AI application provides the specific 
-logic needed to perform some specific task such as chat, summarization, object 
-detection, etc. 
+### Serving Models
 
-There are several sample applications in this repository that can be found in the
-[recipes](./recipes) directory.
+To serve a model via HTTP, use the `serve` command. This will start an HTTP server that listens for incoming requests to interact with the model.
 
-They fall under the categories:
+```
+$ podman-llm serve granite
+...
+{"tid":"140477699799168","timestamp":1719579518,"level":"INFO","function":"main","line":3793,"msg":"HTTP server listening","n_threads_http":"11","port":"8080","hostname":"127.0.0.1"}
+...
+```
 
-* [audio](./recipes/audio)
-* [computer-vision](./recipes/computer_vision)
-* [multimodal](./recipes/multimodal)
-* [natural language processing](./recipes/natural_language_processing)
+## Model library
 
+| Model              | Parameters | Run                            |
+| ------------------ | ---------- | ------------------------------ |
+| granite            | 3B         | `podman-llm run granite`       |
+| mistral            | 7B         | `podman-llm run mistral`       |
+| merlinite          | 7B         | `podman-llm run merlinite`     |
 
-Learn how to build and run each application by visiting their README's. 
-For example, learn how to run the [chatbot recipe here](./recipes/natural_language_processing/chatbot).
+## Containerfile Example
 
-## Current AI Lab Recipe images built from this repository
+Here is an example Containerfile:
 
-Images for many sample applications and models are available in `quay.io`. All
-currently built images are  tracked in
-[ailab-images.md](./ailab-images.md)
+```
+FROM quay.io/podman-llm/podman-llm:41
+LABEL model=/granite-3b-code-instruct.Q4_K_M.gguf
+RUN llama-main --hf-repo ibm-granite/granite-3b-code-instruct-GGUF -m granite-3b-code-instruct.Q4_K_M.gguf
+```
 
-## [Training](./training/README.md)
+`LABEL model` is important so we know where to find the .gguf file.
+
+And we build via:
+
+```
+podman build -t granite podman-llm/granite:3b
+```
+
+## Diagram
+
+```
++------------------------+    +--------------------+    +------------------+
+|                        |    | Pull runtime layer |    | Pull model layer |
+|    podman-llm run      | -> | with llama.cpp     | -> | with granite     |
+|                        |    |                    |    |                  |
++------------------------+    +--------------------+    |------------------|
+                                                        | Repo options:    |
+                                                        +------------------+
+                                                            |          |
+                                                            v          v
+                                                    +--------------+ +---------+
+                                                    | Hugging Face | | quay.io |
+                                                    +--------------+ +---------+
+                                                            \          /
+                                                             \        /
+                                                              \      /
+                                                               v    v
+                                                        +-----------------+
+                                                        | Start container |
+                                                        | with llama.cpp  |
+                                                        | and granite     |
+                                                        | model           |
+                                                        +-----------------+
+```
 
-Linux Operating System Bootable containers enabled for AI Training
diff --git a/model_servers/podman-llm/README.md b/model_servers/podman-llm/README.md
@@ -0,0 +1,89 @@
+# podman-llm
+
+The goal of podman-llm is to make AI even more boring.
+
+## Install
+
+Install podman-llm by running this one-liner:
+
+```
+curl -fsSL https://raw.githubusercontent.com/ericcurtin/podman-llm/main/install.sh | sudo bash
+```
+
+## Usage
+
+### Running Models
+
+You can run a model using the `run` command. This will start an interactive session where you can query the model.
+
+```
+$ podman-llm run granite
+> Tell me about podman in less than ten words
+A fast, secure, and private container engine for modern applications.
+>
+```
+
+### Serving Models
+
+To serve a model via HTTP, use the `serve` command. This will start an HTTP server that listens for incoming requests to interact with the model.
+
+```
+$ podman-llm serve granite
+...
+{"tid":"140477699799168","timestamp":1719579518,"level":"INFO","function":"main","line":3793,"msg":"HTTP server listening","n_threads_http":"11","port":"8080","hostname":"127.0.0.1"}
+...
+```
+
+## Model library
+
+| Model              | Parameters | Run                            |
+| ------------------ | ---------- | ------------------------------ |
+| granite            | 3B         | `podman-llm run granite`       |
+| mistral            | 7B         | `podman-llm run mistral`       |
+| merlinite          | 7B         | `podman-llm run merlinite`     |
+
+## Containerfile Example
+
+Here is an example Containerfile:
+
+```
+FROM quay.io/podman-llm/podman-llm:41
+LABEL model=/granite-3b-code-instruct.Q4_K_M.gguf
+RUN llama-main --hf-repo ibm-granite/granite-3b-code-instruct-GGUF -m granite-3b-code-instruct.Q4_K_M.gguf
+```
+
+`LABEL model` is important so we know where to find the .gguf file.
+
+And we build via:
+
+```
+podman build -t granite podman-llm/granite:3b
+```
+
+## Diagram
+
+```
++------------------------+    +--------------------+    +------------------+
+|                        |    | Pull runtime layer |    | Pull model layer |
+|    podman-llm run      | -> | with llama.cpp     | -> | with granite     |
+|                        |    |                    |    |                  |
++------------------------+    +--------------------+    |------------------|
+                                                        | Repo options:    |
+                                                        +------------------+
+                                                            |          |
+                                                            v          v
+                                                    +--------------+ +---------+
+                                                    | Hugging Face | | quay.io |
+                                                    +--------------+ +---------+
+                                                            \          /
+                                                             \        /
+                                                              \      /
+                                                               v    v
+                                                        +-----------------+
+                                                        | Start container |
+                                                        | with llama.cpp  |
+                                                        | and granite     |
+                                                        | model           |
+                                                        +-----------------+
+```
+