Skip to content

Commit debc858

Browse files
timliubentomlTim Liussheng
authored
docs: core concept and guides (bentoml#2417)
* core concepts edits * added content to a few guides * initial logging documentation * updates to format and added contributer images in readme * more updates to formatting * Update README.md * Update README.md * Update README.md * Update bento_management.rst * Update docs/source/guides/adaptive_batching.rst Co-authored-by: Sean Sheng <[email protected]> * Update docs/source/guides/adaptive_batching.rst Co-authored-by: Sean Sheng <[email protected]> * updates based on sean's feedback Co-authored-by: Tim Liu <[email protected]> Co-authored-by: Sean Sheng <[email protected]>
1 parent dbb6925 commit debc858

File tree

12 files changed

+373
-99
lines changed

12 files changed

+373
-99
lines changed

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,14 @@ Or by setting environment variable `BENTOML_DO_NOT_TRACK=True`:
7070
export BENTOML_DO_NOT_TRACK=True
7171
```
7272

73+
---
74+
### Contributors! ###
75+
76+
Thanks to all of our amazing contributors!
77+
78+
<a href="https://github.com/bentoml/BentoML/graphs/contributors">
79+
<img src="https://contrib.rocks/image?repo=bentoml/BentoML" />
80+
</a>
7381

7482
### License ###
7583

docs/README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ http://docs.bentoml.org/ to read the full documentation.
66
---
77

88
**NOTE**:
9-
All of the below `make` commands should be used under `bentoml` root directory.
9+
All of the below `make` commands should be used under `bentoml` root directory. Only MacOS and Linux (UNIX-based system only) are supported at the moment for live reloading of the documentation
1010

1111
To generate the documentation, make sure to install all dependencies (mainly `sphinx` and its extension):
1212

@@ -19,15 +19,16 @@ Once you have `sphinx` installed, you can build the documentation and enable wat
1919
» make watch-docs
2020
```
2121

22-
For Apple Silicon (M1), a environment variable is required:
23-
```bash
24-
» export PYENCHANT_LIBRARY_PATH=/opt/homebrew/lib/libenchant-2.2.dylib
25-
```
26-
then install pychant with `brew`:
22+
For Apple Silicon (M1), follow the latest suggested installation method for [PyEnchant](https://pyenchant.github.io/pyenchant/install.html)
23+
As of this writing there is no compatible arm64 version of pyenchant and the best way to install is the following commands:
24+
2725
```bash
28-
» arch -arm64 brew install enchant
26+
» arch -x86_64 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
27+
» arch -x86_64 /usr/local/bin/brew install enchant
2928
```
3029

30+
Make sure that PYENCHANT_LIBRARY_PATH is set to the location of libenchant. For MacOS make sure it has the dylib extension, otherwise the .so for Linux based systems.
31+
3132
## Documentation specification
3233

3334
`bentoml/BentoML` follows [Google's docstring style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings),
147 KB
Loading
333 KB
Loading
99 KB
Loading

docs/source/concepts/api_io_descriptors.rst

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ common model serving scenarios.
2525
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
2626
def predict(input_array: np.ndarray) -> np.ndarray:
2727
# Define pre-processing logic
28-
result = await runner.run(input_array)
28+
result = runner.run(input_array)
2929
# Define post-processing logic
3030
return result
3131
@@ -59,9 +59,10 @@ two runners simultaneously, and returns the better result.
5959
)
6060
return compare_results(results)
6161
62-
The asynchronous API implementation is more efficient because while the coroutine is awaiting for
63-
results from the feature store or the model runners, the event loop is freed up to serve another request.
64-
BentoML will intelligently create an optimally sized event loop based on the available number of CPU cores. Further tuning of event loop configuration is not needed under common use cases.
62+
The asynchronous API implementation is more efficient because when an asynchronous method is invoked, the event loop is
63+
released to service other requests while this request awaits the results of the method. In addition, BentoML will automatically
64+
configure the ideal amount of parallelism based on the available number of CPU cores. Further tuning of event loop configuration
65+
is not needed under common use cases.
6566

6667
IO Descriptors
6768
--------------
@@ -168,4 +169,6 @@ descriptor can be customized with independent schema and validation logic.
168169
169170
.. todo::
170171

171-
Add further reading section
172+
Further Reading
173+
---------------
174+
- :ref:`API Reference for IO descriptors <api-io-descriptors>`

docs/source/concepts/bento_management.rst

Lines changed: 63 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,11 @@
33
Model and Bento Management
44
**************************
55

6-
BentoML provides easy to use local and centralized stores for managing models and bentos. This article
7-
focuses on the use of local file system based model and bento stores. To learn more about the centralized
8-
store solution, see BentoML Yatai. To connect the CLI to a remote `Yatai <yatai-service-page>`,
9-
use the `bentoml login` command.
6+
BentoML allows you to store models and bentos in local as well as remote repositories. Tools are also provided to easily
7+
manage the lifecycle of these artifacts. This documentation details the cli tools for both local and remote scenarios
108

11-
.. todo::
12-
13-
Link to BentoML Yatai documentation.
14-
15-
Managing Models
16-
---------------
9+
Managing Models Locally
10+
-----------------------
1711

1812
Creating Models
1913
^^^^^^^^^^^^^^^
@@ -36,7 +30,8 @@ is imported from the MLFlow Model Registry.
3630
bentoml.mlflow.import_from_uri("mlflow_model", uri=mlflow_registry_uri)
3731
3832
Saved and imported models are added to the local file system based model store located in the
39-
`$HOME/bentoml/models` directory by default.
33+
`$HOME/bentoml/models` directory by default. In order to see what types of model creation is supported per framework, please
34+
visit our :ref:`Frameworks <frameworks-page>` section.
4035

4136
Listing Models
4237
^^^^^^^^^^^^^^
@@ -159,45 +154,8 @@ module or the `models delete` CLI command.
159154

160155
> bentoml models delete iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
161156

162-
Pushing Models
163-
^^^^^^^^^^^^^^
164-
165-
Once you are happy with a model and ready to share with other collaborators, you can upload it to a
166-
remote `Yatai <yatai-service-page>` model store with the `push()` function under the `bentoml.models`
167-
module or the `models push` CLI command.
168-
169-
.. tabs::
170-
171-
.. code-tab:: python
172-
173-
import bentoml.models
174-
175-
bentoml.models.push("iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare", skip_confirm=True)
176-
177-
.. code-tab:: bash
178-
179-
> bentoml models push iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
180-
181-
Pulling Models
182-
^^^^^^^^^^^^^^
183-
184-
Previously pushed models can be downloaded from `Yatai <yatai-service-page>` and saved local model
185-
store with the `pull()` function under the `bentoml.models` module or the `models pull` CLI command.
186-
187-
.. tabs::
188-
189-
.. code-tab:: python
190-
191-
import bentoml.models
192-
193-
bentoml.modles.pull("iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare", url=yatai_url)
194-
195-
.. code-tab:: bash
196-
197-
> bentoml models pull iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
198-
199-
Managing Bentos
200-
---------------
157+
Managing Bentos Locally
158+
-----------------------
201159

202160
Creating Bentos
203161
^^^^^^^^^^^^^^^
@@ -231,26 +189,78 @@ To delete bentos in the bento store, use the `delete` CLI command.
231189
232190
> bentoml delete iris_classifier_service:v5mgcacfgzi6zdz7vtpeqaare
233191
192+
Managing Models and Bentos Remotely with Yatai
193+
----------------------------------------------
194+
195+
Yatai is BentoML's end to end deployment and monitoring platform. It also functions as a remote model and bento repository. To connect the CLI to a remote `Yatai <yatai-service-page>`, use the `bentoml login` command.
196+
197+
.. tabs::
198+
199+
.. code-tab:: bash
200+
201+
> bentoml login <YATAI_URL>
202+
203+
204+
Once logged in, you'll be able to use the following commands.
205+
206+
Pushing Models
207+
^^^^^^^^^^^^^^
208+
209+
Once you are happy with a model and ready to share with other collaborators, you can upload it to a
210+
remote `Yatai <yatai-service-page>` model store with the `push()` function under the `bentoml.models`
211+
module or the `models push` CLI command.
212+
213+
.. tabs::
214+
215+
.. code-tab:: python
216+
217+
import bentoml.models
218+
219+
bentoml.models.push("iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare", skip_confirm=True)
220+
221+
.. code-tab:: bash
222+
223+
> bentoml models push iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
224+
225+
Pulling Models
226+
^^^^^^^^^^^^^^
227+
228+
Previously pushed models can be downloaded from `Yatai <yatai-service-page>` and saved local model
229+
store with the `pull()` function under the `bentoml.models` module or the `models pull` CLI command.
230+
231+
.. tabs::
232+
233+
.. code-tab:: python
234+
235+
import bentoml.models
236+
237+
bentoml.modles.pull("iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare", url=yatai_url)
238+
239+
.. code-tab:: bash
240+
241+
> bentoml models pull iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
242+
234243
Pushing Bentos
235244
^^^^^^^^^^^^^^
236245

237-
To upload bento in the local file system store to a remote `Yatai <yatai-service-page>` bento store
246+
To upload bento in the local file system store to a remote `Yatai <yatai-service-page>` bento store
238247
for collaboration and deployment, use the `push` CLI command.
239248

240249
.. code-block:: bash
241-
250+
242251
> bentoml push iris_classifier_service:v5mgcacfgzi6zdz7vtpeqaare
243252
244253
Pulling Bentos
245254
^^^^^^^^^^^^^^
246255

247-
To download a bento from a remote `Yatai <yatai-service-page>` bento store to the local file system
256+
To download a bento from a remote `Yatai <yatai-service-page>` bento store to the local file system
248257
bento store for troubleshooting, use the `pull` CLI command.
249258

250259
.. code-block:: bash
251260
252261
> bentoml pull iris_classifier_service:v5mgcacfgzi6zdz7vtpeqaare
253262
263+
254264
Further Reading
255265
---------------
256266
- Install Yatai

docs/source/concepts/service_definition.rst

Lines changed: 32 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ Composition
1414

1515
Consider the following service definition we created in the :ref:`Getting Started <getting-started-page>` guide.
1616
A BentoML service is composed of three components.
17-
- APIs
1817
- Runners
1918
- Services
19+
- APIs
2020

2121
.. code-block:: python
2222
@@ -40,34 +40,6 @@ A BentoML service is composed of three components.
4040
# Define post-processing logic
4141
return result
4242
43-
APIs
44-
----
45-
46-
Inference APIs define how the service functionality can be accessed remotely and the high level pre- and post-processing logic.
47-
48-
.. code-block:: python
49-
50-
# Create API function with pre- and post- processing logic
51-
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
52-
def predict(input_array: np.ndarray) -> np.ndarray:
53-
# Define pre-processing logic
54-
result = runner.run(input_array)
55-
# Define post-processing logic
56-
return result
57-
58-
By decorating a function with `@svc.api`, we declare that the function is a part of the APIs that can be accessed remotely.
59-
A service can have one or many APIs. The `input` and `output` arguments of the `@svc.api` decorator further defines the expect
60-
IO formats of the API. In the above example, the API defines the IO types as `numpy.ndarray` through the `NumpyNdarray`
61-
:ref:`IO descriptors <api-io-descriptors-page>`. IO descriptors help validate that the input and output conform to the expected format
62-
and schema and convert them from and to the native types. BentoML supports a variety of IO descriptors including `PandasDataFrame`,
63-
`String`, `Image`, and `File`.
64-
65-
The API is also a great place to define your pre- and post-process logic of model serving. In the example above, the logic defined
66-
in the `predict` function will be packaged and deployed as a part of the serving logic.
67-
68-
BentoML aims to parallelize API logic by starting multiple instances of the API server based on available system resources. For
69-
optimal performance, we recommend defining asynchronous APIs. To learn more, continue to :ref:`IO descriptors <api-io-descriptors-page>`.
70-
7143
Runners
7244
-------
7345

@@ -102,8 +74,37 @@ Services are composed of APIs and Runners and can be initialized through `bentom
10274
svc = bentoml.Service("iris_classifier_service", runners=[runner])
10375
10476
The first argument of the service is the name which will become the name of the Bento after the service is built. Runners that
105-
should be parts of the service are passed in through the `runners` keyword argument. Build time and runtime behaviors of the
106-
service can be customized through the `svc` instance.
77+
should be parts of the service are passed in through the `runners` keyword argument. This is an important step because this is
78+
how the BentoML library knows which runners to package into the bento. Build time and runtime behaviors of the service can be
79+
customized through the `svc` instance.
80+
81+
APIs
82+
----
83+
84+
Inference APIs define how the service functionality can be accessed remotely and the high level pre- and post-processing logic.
85+
86+
.. code-block:: python
87+
88+
# Create API function with pre- and post- processing logic
89+
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
90+
def predict(input_array: np.ndarray) -> np.ndarray:
91+
# Define pre-processing logic
92+
result = runner.run(input_array)
93+
# Define post-processing logic
94+
return result
95+
96+
By decorating a function with `@svc.api`, we declare that the function is a part of the APIs that can be accessed remotely.
97+
A service can have one or many APIs. The `input` and `output` arguments of the `@svc.api` decorator further defines the expect
98+
IO formats of the API. In the above example, the API defines the IO types as `numpy.ndarray` through the `NumpyNdarray`
99+
:ref:`IO descriptors <api-io-descriptors-page>`. IO descriptors help validate that the input and output conform to the expected format
100+
and schema and convert them from and to the native types. BentoML supports a variety of IO descriptors including `PandasDataFrame`,
101+
`String`, `Image`, and `File`. For detailed documentation on how to declare and invoke these descriptors please see the :ref:`API Reference for IO descriptors <api-io-descriptors>`
102+
103+
The API is also a great place to define your pre- and post-process logic of model serving. In the example above, the logic defined
104+
in the `predict` function will be packaged and deployed as a part of the serving logic.
105+
106+
BentoML aims to parallelize API logic by starting multiple instances of the API server based on available system resources. For
107+
optimal performance, we recommend defining asynchronous APIs. To learn more, continue to :ref:`IO descriptors <api-io-descriptors-page>`.
107108

108109
Further Reading
109110
---------------

docs/source/guides/adaptive_batching.rst

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,46 @@
33
Adaptive Batching
44
=================
55

6-
TODO
6+
Batching is the term used for combining multiple inputs for submission to processing at the same time. The idea is that processing multiple messages is be faster than processing each individual message one at a time. In practice many ML frameworks have optimizations for processing multiple messages at a time because that is how the underlying hardware works in many cases.
7+
8+
.. epigraph::
9+
"While serving a TensorFlow model, batching individual model inference requests together can be important for performance. In particular, batching is necessary to unlock the high throughput promised by hardware accelerators such as GPUs."
10+
-- `TensorFlow documentation <https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md>`_
11+
12+
The current batching feature is implemented on the server-side. This is advantageous as opposed to client-side batching because it simplifies the client's logic and it is often times more efficient due to traffic volume.
13+
14+
As an optimization for a real-time service, batching works off of 2 main concepts.
15+
16+
1. Batching Window: The maximum time that a service should wait to build a “batch” before releasing a batch for processing. This is essentially the max latency for processing in a low throughput system. It helps avoid the situation where if very few messages have been submitted (smaller than the max batch size) the batch must wait for a long time to be processed.
17+
2. Max Batch Size: The maximum size that a batch can reach before the batch is release for processing. It puts a cap on the size of the batch in which should optimize for maximum throughput. The concept only applies within the maximum wait time before the batch is released.
18+
19+
BentoML’s adaptive batching works off of these 2 basic concepts and builds on them. Our adaptive batching adapts both the batching window and the max batch size based off of incoming traffic patterns at the time. The dispatching mechanism regresses the recent processing time, wait time and batch sizes to optimize for lowest latency.
20+
21+
Architecture
22+
------------
23+
24+
The batching mechanism is located on the model runner. Each model runner receives inference requests and batches those requests based on optimal latency.
25+
26+
.. image:: ../_static/img/batching-diagram.png
27+
28+
The load balancer will distribute the requests to each of the running API services. The API services will in turn distribute the inference requests to the model runners. The distribution of requests to the model runners uses a random algorithm which provides for slightly more efficient batch sizes as opposed to round robin. Additional dispatch algorithms are planned for the future.
29+
30+
Running with Adaptive Batching
31+
------------------------------
32+
33+
There are 2 ways that adaptive batching will run depending on how you've deployed BentoML.
34+
35+
Standalone Mode
36+
^^^^^^^^^^^^^^^
37+
38+
In the standard BentoML library, each model runner is it’s own process. In this case, the batching happens at the process level.
39+
40+
Distributed
41+
^^^^^^^^^^^
42+
43+
For a Yatai deployment into Kubernetes, each model runner is structured as it’s own Pod. The batching will occur at the Pod level.
44+
45+
Configuring Batching
46+
--------------------
47+
48+
The main configuration concern is the way in which each input is combined when batching occurs. We call this the “batch axis”. When configuring whether a model runner should be batching, the batch axis must be specified

0 commit comments

Comments
 (0)