You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[NVTabular](https://github.com/NVIDIA/NVTabular) is a feature engineering and preprocessing library for tabular data that is designed to quickly and easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the [RAPIDS Dask-cuDF](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf) library. NVTabular is designed to be interoperable with both PyTorch and TensorFlow using dataloaders that have been developed as extensions of native framework code. In our experiments, we were able to speed up existing TensorFlow pipelines by 9 times and existing PyTorch pipelines by 5 times with our highlyoptimized dataloaders.
3
+
[NVTabular](https://github.com/NVIDIA/NVTabular) is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the [RAPIDS Dask-cuDF](https://github.com/rapidsai/cudf/tree/main/python/dask_cudf) library. NVTabular is designed to be interoperable with both PyTorch and TensorFlow using dataloaders that have been developed as extensions of native framework code. In our experiments, we were able to speed up existing TensorFlow pipelines by nine times and existing PyTorch pipelines by five times with our highly-optimized dataloaders.
4
4
5
5
NVTabular is a component of [NVIDIA Merlin Open Beta](https://developer.nvidia.com/nvidia-merlin). NVIDIA Merlin is used for building large-scale recommender systems. With NVTabular being a part of the Merlin ecosystem, it also works with the other Merlin components including [HugeCTR](https://github.com/NVIDIA/HugeCTR) and [Triton Inference Server](https://github.com/NVIDIA/tensorrt-inference-server) to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.
6
6
@@ -36,13 +36,13 @@ To learn more about NVTabular's core features, see the following:
36
36
37
37
### Performance
38
38
39
-
When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Futhermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within 3 minutes. Combined with [HugeCTR](http://www.github.com/NVIDIA/HugeCTR/), the dataset can be processed and a full model can be trained in only 6 minutes.
39
+
When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Futhermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with [HugeCTR](http://www.github.com/NVIDIA/HugeCTR/), the dataset can be processed and a full model can be trained in only six minutes.
40
40
41
41
The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.
42
42
43
43
### Installation
44
44
45
-
To install NVTabular, ensure that you meet the following prerequisites:
45
+
Prior to installing NVTabular, ensure that you meet the following prerequisites:
46
46
47
47
* CUDA version 10.1+
48
48
* Python version 3.7+
@@ -78,12 +78,12 @@ NVTabular Docker containers are available in the [NVIDIA Merlin container reposi
78
78
79
79
| Container Name | Container Location | Functionality |
| merlin-pytorch-training |https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-pytorch-training| NVTabular and PyTorch |
84
-
| merlin-inference |https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-inference| NVTabular, HugeCTR, and Triton Inference |
84
+
| merlin-pytorch-training |https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-pytorch-training| NVTabular and PyTorch |
85
85
86
-
To use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers.
86
+
To use these Docker containers, you'll first need to install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see [Support Matrix](https://github.com/NVIDIA/NVTabular/blob/main/docs/source/resources/support_stack.rst).
87
87
88
88
### Notebook Examples and Tutorials
89
89
@@ -106,7 +106,7 @@ Each Jupyter notebook covers the following:
106
106
107
107
### Feedback and Support
108
108
109
-
If you'd like to contribute to the library directly, please see the [Contributing.md](https://github.com/NVIDIA/NVTabular/blob/main/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline using this [survey](https://developer.nvidia.com/merlin-devzone-survey).
109
+
If you'd like to contribute to the library directly, see the [Contributing.md](https://github.com/NVIDIA/NVTabular/blob/main/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey).
110
110
111
-
If you're interested in learning more about how NVTabular works see
112
-
[our documentation](https://nvidia.github.io/NVTabular/main/Introduction.html). We also have [API documentation](https://nvidia.github.io/NVTabular/main/api/index.html) that outlines the specifics of the available calls within the library.
111
+
If you're interested in learning more about how NVTabular works, see
112
+
[our NVTabular documentation](https://nvidia.github.io/NVTabular/main/Introduction.html). We also have [API documentation](https://nvidia.github.io/NVTabular/main/api/index.html) that outlines the specifics of the available calls within the library.
Copy file name to clipboardExpand all lines: docs/source/resources/support_matrix.rst
+12-10Lines changed: 12 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,23 @@
1
-
Support Stack
2
-
==============
1
+
NVTabular Support Matrix
2
+
========================
3
3
4
4
.. role:: raw-html(raw)
5
5
:format: html
6
6
7
-
We offer the following support stacks:
7
+
We offer the following containers:
8
8
9
9
* `Merlin Inference <#table-1>`_: Allows you to deploy NVTabular workflows and HugeCTR or TensorFlow models to the Triton Inference server for production.
10
10
* `Merlin Training <#table-2>`_: Allows you to do preprocessing and feature engineering with NVTabular so that you can train a deep learning recommendation model with HugeCTR.
11
11
* `Merlin TensorFlow Training <#table-3>`_: Allows you to do preprocessing and feature engineering with NVTabular so that you can train a deep learning recommendation model with TensorFlow.
12
12
* `Merlin PyTorch Training <#table-4>`_: Allows you to do preprocessing and feature engineering with NVTabular so that you can train a deep learning recommendation model with PyTorch.
13
13
14
-
The following tables provide the software and model versions that NVTabular version 0.6 supports.
14
+
The following tables provide the software and model versions that NVTabular version 0.6 supports per container.
15
15
16
16
:raw-html:`<br/>`
17
17
18
18
.. _table-1:
19
19
20
-
:raw-html:`<palign="center"><b>Table 1: Support stack matrix for the Merlin Inference (merlin-inference) image</b></p>`
20
+
:raw-html:`<palign="center"><b>Table 1: Support matrix for the Merlin Inference (merlin-inference) container</b></p>`
0 commit comments