whoseoyster
diff --git a/‎README.md‎
Lines changed: 8 additions & 0 deletions b/‎README.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎docs/README.md‎
Lines changed: 8 additions & 7 deletions b/‎docs/README.md‎
Lines changed: 8 additions & 7 deletions
diff --git a/‎docs/source/_static/img/batching-diagram.png‎
147 KB b/‎docs/source/_static/img/batching-diagram.png‎
147 KB
diff --git a/‎docs/source/_static/img/bentoml-logs-exception.png‎
333 KB b/‎docs/source/_static/img/bentoml-logs-exception.png‎
333 KB
diff --git a/‎docs/source/_static/img/bentoml-logs.png‎
99 KB b/‎docs/source/_static/img/bentoml-logs.png‎
99 KB
diff --git a/‎docs/source/concepts/api_io_descriptors.rst‎
Lines changed: 8 additions & 5 deletions b/‎docs/source/concepts/api_io_descriptors.rst‎
Lines changed: 8 additions & 5 deletions
diff --git a/‎docs/source/concepts/bento_management.rst‎
Lines changed: 63 additions & 53 deletions b/‎docs/source/concepts/bento_management.rst‎
Lines changed: 63 additions & 53 deletions
diff --git a/‎docs/source/concepts/service_definition.rst‎
Lines changed: 32 additions & 31 deletions b/‎docs/source/concepts/service_definition.rst‎
Lines changed: 32 additions & 31 deletions
diff --git a/‎docs/source/guides/adaptive_batching.rst‎
Lines changed: 43 additions & 1 deletion b/‎docs/source/guides/adaptive_batching.rst‎
Lines changed: 43 additions & 1 deletion
@@ -70,6 +70,14 @@ Or by setting environment variable `BENTOML_DO_NOT_TRACK=True`:
 export BENTOML_DO_NOT_TRACK=True
 ```
 
+---
+### Contributors! ###
+
+Thanks to all of our amazing contributors!
+
+<a href="https://github.com/bentoml/BentoML/graphs/contributors">
+  <img src="https://contrib.rocks/image?repo=bentoml/BentoML" />
+</a>
 
 ### License ###
 
 
@@ -6,7 +6,7 @@ http://docs.bentoml.org/ to read the full documentation.
 ---
 
 **NOTE**:
-All of the below `make` commands should be used under `bentoml` root directory.
+All of the below `make` commands should be used under `bentoml` root directory. Only MacOS and Linux (UNIX-based system only) are supported at the moment for live reloading of the documentation
 
 To generate the documentation, make sure to install all dependencies (mainly `sphinx` and its extension):
 
@@ -19,15 +19,16 @@ Once you have `sphinx` installed, you can build the documentation and enable wat
 » make watch-docs
 ```
 
-For Apple Silicon (M1), a environment variable is required:
-```bash
-» export PYENCHANT_LIBRARY_PATH=/opt/homebrew/lib/libenchant-2.2.dylib
-```
-then install pychant with `brew`:
+For Apple Silicon (M1), follow the latest suggested installation method for [PyEnchant](https://pyenchant.github.io/pyenchant/install.html)
+As of this writing there is no compatible arm64 version of pyenchant and the best way to install is the following commands:
+
 ```bash
-» arch -arm64 brew install enchant
+» arch -x86_64 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+» arch -x86_64 /usr/local/bin/brew install enchant
 ```
 
+Make sure that PYENCHANT_LIBRARY_PATH is set to the location of libenchant. For MacOS make sure it has the dylib extension, otherwise the .so for Linux based systems.
+
 ## Documentation specification
 
 `bentoml/BentoML` follows [Google's docstring style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings),
 
@@ -25,7 +25,7 @@ common model serving scenarios.
     @svc.api(input=NumpyNdarray(), output=NumpyNdarray())
     def predict(input_array: np.ndarray) -> np.ndarray:
         # Define pre-processing logic
-        result = await runner.run(input_array)
+        result = runner.run(input_array)
         # Define post-processing logic
         return result
 
@@ -59,9 +59,10 @@ two runners simultaneously, and returns the better result.
         )
         return compare_results(results)
 
-The asynchronous API implementation is more efficient because while the coroutine is awaiting for 
-results from the feature store or the model runners, the event loop is freed up to serve another request. 
-BentoML will intelligently create an optimally sized event loop based on the available number of CPU cores. Further tuning of event loop configuration is not needed under common use cases.
+The asynchronous API implementation is more efficient because when an asynchronous method is invoked, the event loop is
+released to service other requests while this request awaits the results of the method. In addition, BentoML will automatically
+configure the ideal amount of parallelism based on the available number of CPU cores. Further tuning of event loop configuration
+is not needed under common use cases.
 
 IO Descriptors
 --------------
@@ -168,4 +169,6 @@ descriptor can be customized with independent schema and validation logic.
 
 .. todo::
 
-    Add further reading section
+Further Reading
+---------------
+- :ref:`API Reference for IO descriptors <api-io-descriptors>`
@@ -3,17 +3,11 @@
 Model and Bento Management
 **************************
 
-BentoML provides easy to use local and centralized stores for managing models and bentos. This article 
-focuses on the use of local file system based model and bento stores. To learn more about the centralized 
-store solution, see BentoML Yatai. To connect the CLI to a remote `Yatai <yatai-service-page>`, 
-use the `bentoml login` command.
+BentoML allows you to store models and bentos in local as well as remote repositories. Tools are also provided to easily
+manage the lifecycle of these artifacts. This documentation details the cli tools for both local and remote scenarios
 
-.. todo::
-
-    Link to BentoML Yatai documentation.
-
-Managing Models
----------------
+Managing Models Locally
+-----------------------
 
 Creating Models
 ^^^^^^^^^^^^^^^
@@ -36,7 +30,8 @@ is imported from the MLFlow Model Registry.
     bentoml.mlflow.import_from_uri("mlflow_model", uri=mlflow_registry_uri)
 
 Saved and imported models are added to the local file system based model store located in the 
-`$HOME/bentoml/models` directory by default.
+`$HOME/bentoml/models` directory by default. In order to see what types of model creation is supported per framework, please
+visit our :ref:`Frameworks <frameworks-page>` section.
 
 Listing Models
 ^^^^^^^^^^^^^^
@@ -159,45 +154,8 @@ module or the `models delete` CLI command.
 
         > bentoml models delete iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
 
-Pushing Models
-^^^^^^^^^^^^^^
-
-Once you are happy with a model and ready to share with other collaborators, you can upload it to a 
-remote `Yatai <yatai-service-page>` model store with the `push()` function under the `bentoml.models` 
-module or the `models push` CLI command.
-
-.. tabs::
-
-    .. code-tab:: python 
-
-        import bentoml.models
-
-        bentoml.models.push("iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare", skip_confirm=True)
-    
-    .. code-tab:: bash
-
-        > bentoml models push iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
-
-Pulling Models
-^^^^^^^^^^^^^^
-
-Previously pushed models can be downloaded from `Yatai <yatai-service-page>` and saved local model 
-store with the `pull()` function under the `bentoml.models` module or the `models pull` CLI command.
-
-.. tabs::
-
-    .. code-tab:: python 
-
-        import bentoml.models
-
-        bentoml.modles.pull("iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare", url=yatai_url)
-    
-    .. code-tab:: bash
-
-        > bentoml models pull iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
-
-Managing Bentos
----------------
+Managing Bentos Locally
+-----------------------
 
 Creating Bentos
 ^^^^^^^^^^^^^^^
@@ -231,26 +189,78 @@ To delete bentos in the bento store, use  the `delete` CLI command.
     
     > bentoml delete iris_classifier_service:v5mgcacfgzi6zdz7vtpeqaare
 
+Managing Models and Bentos Remotely with Yatai
+----------------------------------------------
+
+Yatai is BentoML's end to end deployment and monitoring platform. It also functions as a remote model and bento repository. To connect the CLI to a remote `Yatai <yatai-service-page>`, use the `bentoml login` command.
+
+.. tabs::
+
+    .. code-tab:: bash
+
+        > bentoml login <YATAI_URL>
+
+
+Once logged in, you'll be able to use the following commands.
+
+Pushing Models
+^^^^^^^^^^^^^^
+
+Once you are happy with a model and ready to share with other collaborators, you can upload it to a
+remote `Yatai <yatai-service-page>` model store with the `push()` function under the `bentoml.models`
+module or the `models push` CLI command.
+
+.. tabs::
+
+    .. code-tab:: python
+
+        import bentoml.models
+
+        bentoml.models.push("iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare", skip_confirm=True)
+
+    .. code-tab:: bash
+
+        > bentoml models push iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
+
+Pulling Models
+^^^^^^^^^^^^^^
+
+Previously pushed models can be downloaded from `Yatai <yatai-service-page>` and saved local model
+store with the `pull()` function under the `bentoml.models` module or the `models pull` CLI command.
+
+.. tabs::
+
+    .. code-tab:: python
+
+        import bentoml.models
+
+        bentoml.modles.pull("iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare", url=yatai_url)
+
+    .. code-tab:: bash
+
+        > bentoml models pull iris_classifier_model:vmiqwpcfifi6zhqqvtpeqaare
+
 Pushing Bentos
 ^^^^^^^^^^^^^^
 
-To upload bento in the local file system store to a remote `Yatai <yatai-service-page>` bento store 
+To upload bento in the local file system store to a remote `Yatai <yatai-service-page>` bento store
 for collaboration and deployment, use the `push` CLI command.
 
 .. code-block:: bash
-    
+
     > bentoml push iris_classifier_service:v5mgcacfgzi6zdz7vtpeqaare
 
 Pulling Bentos
 ^^^^^^^^^^^^^^
 
-To download a bento from a remote `Yatai <yatai-service-page>` bento store to the local file system 
+To download a bento from a remote `Yatai <yatai-service-page>` bento store to the local file system
 bento store for troubleshooting, use the `pull` CLI command.
 
 .. code-block:: bash
 
     > bentoml pull iris_classifier_service:v5mgcacfgzi6zdz7vtpeqaare
 
+
 Further Reading
 ---------------
 - Install Yatai
 
@@ -14,9 +14,9 @@ Composition
 
 Consider the following service definition we created in the :ref:`Getting Started <getting-started-page>` guide. 
 A BentoML service is composed of three components.
-- APIs
 - Runners
 - Services
+- APIs
 
 .. code-block:: python
 
@@ -40,34 +40,6 @@ A BentoML service is composed of three components.
         # Define post-processing logic
         return result
 
-APIs
-----
-
-Inference APIs define how the service functionality can be accessed remotely and the high level pre- and post-processing logic.
-
-.. code-block:: python
-
-    # Create API function with pre- and post- processing logic
-    @svc.api(input=NumpyNdarray(), output=NumpyNdarray())
-    def predict(input_array: np.ndarray) -> np.ndarray:
-        # Define pre-processing logic
-        result = runner.run(input_array)
-        # Define post-processing logic
-        return result
-
-By decorating a function with `@svc.api`, we declare that the function is a part of the APIs that can be accessed remotely. 
-A service can have one or many APIs. The `input` and `output` arguments of the `@svc.api` decorator further defines the expect 
-IO formats of the API. In the above example, the API defines the IO types as `numpy.ndarray` through the `NumpyNdarray` 
-:ref:`IO descriptors <api-io-descriptors-page>`. IO descriptors help validate that the input and output conform to the expected format 
-and schema and convert them from and to the native types. BentoML supports a variety of IO descriptors including `PandasDataFrame`, 
-`String`, `Image`, and `File`.
-
-The API is also a great place to define your pre- and post-process logic of model serving. In the example above, the logic defined 
-in the `predict` function will be packaged and deployed as a part of the serving logic.
-
-BentoML aims to parallelize API logic by starting multiple instances of the API server based on available system resources. For 
-optimal performance, we recommend defining asynchronous APIs. To learn more, continue to :ref:`IO descriptors <api-io-descriptors-page>`.
-
 Runners
 -------
 
@@ -102,8 +74,37 @@ Services are composed of APIs and Runners and can be initialized through `bentom
     svc = bentoml.Service("iris_classifier_service", runners=[runner])
 
 The first argument of the service is the name which will become the name of the Bento after the service is built. Runners that 
-should be parts of the service are passed in through the `runners` keyword argument. Build time and runtime behaviors of the 
-service can be customized through the `svc` instance.
+should be parts of the service are passed in through the `runners` keyword argument. This is an important step because this is
+how the BentoML library knows which runners to package into the bento. Build time and runtime behaviors of the service can be
+customized through the `svc` instance.
+
+APIs
+----
+
+Inference APIs define how the service functionality can be accessed remotely and the high level pre- and post-processing logic.
+
+.. code-block:: python
+
+    # Create API function with pre- and post- processing logic
+    @svc.api(input=NumpyNdarray(), output=NumpyNdarray())
+    def predict(input_array: np.ndarray) -> np.ndarray:
+        # Define pre-processing logic
+        result = runner.run(input_array)
+        # Define post-processing logic
+        return result
+
+By decorating a function with `@svc.api`, we declare that the function is a part of the APIs that can be accessed remotely.
+A service can have one or many APIs. The `input` and `output` arguments of the `@svc.api` decorator further defines the expect
+IO formats of the API. In the above example, the API defines the IO types as `numpy.ndarray` through the `NumpyNdarray`
+:ref:`IO descriptors <api-io-descriptors-page>`. IO descriptors help validate that the input and output conform to the expected format
+and schema and convert them from and to the native types. BentoML supports a variety of IO descriptors including `PandasDataFrame`,
+`String`, `Image`, and `File`. For detailed documentation on how to declare and invoke these descriptors please see the :ref:`API Reference for IO descriptors <api-io-descriptors>`
+
+The API is also a great place to define your pre- and post-process logic of model serving. In the example above, the logic defined
+in the `predict` function will be packaged and deployed as a part of the serving logic.
+
+BentoML aims to parallelize API logic by starting multiple instances of the API server based on available system resources. For
+optimal performance, we recommend defining asynchronous APIs. To learn more, continue to :ref:`IO descriptors <api-io-descriptors-page>`.
 
 Further Reading
 ---------------
 
@@ -3,4 +3,46 @@
 Adaptive Batching
 =================
 
-TODO
+Batching is the term used for combining multiple inputs for submission to processing at the same time. The idea is that processing multiple messages is be faster than processing each individual message one at a time. In practice many ML frameworks have optimizations for processing multiple messages at a time because that is how the underlying hardware works in many cases.
+
+.. epigraph::
+    "While serving a TensorFlow model, batching individual model inference requests together can be important for performance. In particular, batching is necessary to unlock the high throughput promised by hardware accelerators such as GPUs."
+    -- `TensorFlow documentation <https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md>`_
+
+The current batching feature is implemented on the server-side. This is advantageous as opposed to client-side batching because it simplifies the client's logic and it is often times more efficient due to traffic volume.
+
+As an optimization for a real-time service, batching works off of 2 main concepts.
+
+1. Batching Window: The maximum time that a service should wait to build a “batch” before releasing a batch for processing. This is essentially the max latency for processing in a low throughput system. It helps avoid the situation where if very few messages have been submitted (smaller than the max batch size) the batch must wait for a long time to be processed.
+2. Max Batch Size: The maximum size that a batch can reach before the batch is release for processing. It puts a cap on the size of the batch in which should optimize for maximum throughput. The concept only applies within the maximum wait time before the batch is released.
+
+BentoML’s adaptive batching works off of these 2 basic concepts and builds on them. Our adaptive batching adapts both the batching window and the max batch size based off of incoming traffic patterns at the time. The dispatching mechanism regresses the recent processing time, wait time and batch sizes to optimize for lowest latency.
+
+Architecture
+------------
+
+The batching mechanism is located on the model runner. Each model runner receives inference requests and batches those requests based on optimal latency.
+
+.. image:: ../_static/img/batching-diagram.png
+
+The load balancer will distribute the requests to each of the running API services. The API services will in turn distribute the inference requests to the model runners. The distribution of requests to the model runners uses a random algorithm which provides for slightly more efficient batch sizes as opposed to round robin. Additional dispatch algorithms are planned for the future.
+
+Running with Adaptive Batching
+------------------------------
+
+There are 2 ways that adaptive batching will run depending on how you've deployed BentoML.
+
+Standalone Mode
+^^^^^^^^^^^^^^^
+
+In the standard BentoML library, each model runner is it’s own process. In this case, the batching happens at the process level.
+
+Distributed
+^^^^^^^^^^^
+
+For a Yatai deployment into Kubernetes, each model runner is structured as it’s own Pod. The batching will occur at the Pod level.
+
+Configuring Batching
+--------------------
+
+The main configuration concern is the way in which each input is combined when batching occurs. We call this the “batch axis”. When configuring whether a model runner should be batching, the batch axis must be specified