Releases: NVIDIA-Merlin/NVTabular
Releases · NVIDIA-Merlin/NVTabular
v1.0.0
What's Changed
- Assume 'merlin' is a first party package for isort by @karlhigley in #1420
- End-to-end inference POC migration to new ensemble API by @jperez999 in #1391
- Update test_integration.sh by @albert17 in #1422
- update test_tf4rec.py by @radekosmulski in #1424
- Fix lambda dtype issue in PyTorch Multi-GPU training example notebook by @jperez999 in #1425
- Prevent dataloaders from using GPU memory when CPU device is selected by @jperez999 in #1429
- Fix dtype bug with GroupBy operator when
aggsis a string by @jperez999 in #1430 - Fix typo in example notebook by @L0Z1K in #1390
- Extract Triton
EnsembleDAG tomerlin.systemspackage by @karlhigley in #1426 - Add TagAs and related wrapper classes by @radekosmulski in #1414
- docs: Add preview doc build to PR by @mikemckiernan in #1432
- Docs script by @jperez999 in #1433
- docs: Ensure that parent review directory exists by @mikemckiernan in #1434
- Update reqs by @albert17 in #1406
- Handle aiobotocore v2.0+ in test_s3 by @benfred in #1439
- Update to work with the latest merlin-core by @benfred in #1441
- Add intersphinx mappings for merlin.core by @benfred in #1440
- Updates Container tests by @albert17 in #1445
- Asvdb fix for integration testing by @jperez999 in #1413
- remove setuptools by @jperez999 in #1460
- Update imports for classes that moved to
merlin-coreby @karlhigley in #1447 - Reactivate hugectr Criteo integration test by @jperez999 in #1457
- Wrapper for TagAs did not work by @bschifferer in #1462
- Set up automated docstring coverage checks by @karlhigley in #1454
- doc: Update matrix for 22.03 by @mikemckiernan in #1450
- Remove Systems library from nvtabular by @jperez999 in #1456
- Fix bug about criteo download notebook by @bschifferer in #1453
- Add deprecation warnings to modules that moved to
coreby @karlhigley in #1466 - Hard-code the
Workflowoutput dtypes for HugeCTR in Triton by @karlhigley in #1468 - AWS SageMaker by @bschifferer in #1421
- Improve
Workflowerror about mismatched dtypes by @karlhigley in #1465 - Exclude additional directories and boost docstring coverage req to 35 percent by @karlhigley in #1471
- fix(docs): Restore the version picker by @mikemckiernan in #1474
- Documentation fixes from the docstring scrub by @benfred in #1475
- Add missing
--userflag tonatsortCI install by @karlhigley in #1476 - Change
merlinlevel NVT import totransforms(fromtransform) by @karlhigley in #1472 - Move merlin.core.worker to merlin.io.worker by @karlhigley in #1477
- Fix merlin.core.worker imports by @benfred in #1482
- Use quieter
DeprecationWarninginstead ofFutureWarningby @karlhigley in #1486 - Remove imports to deprecated modules by @benfred in #1487
- README updates by @benfred in #1478
- Add Troubleshoot for OOM errors with NVTabular dataloaders by @bschifferer in #1373
- Upgrade poetry dependencies by @benfred in #1489
- Note in the README that installing with pip runs only on CPU by @karlhigley in #1494
- Add deprecation warnings to
loader,inference,framework_utilsby @karlhigley in #1492 - Add
merlin.transforms.opssub-package by @karlhigley in #1491 - fix for 1455 by @jperez999 in #1497
- Restrict running on pandas 1.4.x by @benfred in #1496
- Fixing Criteo Inference for TensorFlow and HugeCTR by @bschifferer in #1500
- docs: Add a redirect page by @mikemckiernan in #1499
- Final updates for 1.0 release by @benfred in #1501
- update to compatible dtype by @jperez999 in #1503
New Contributors
- @radekosmulski made their first contribution in #1424
- @L0Z1K made their first contribution in #1390
Full Changelog: v0.11.0...v1.0.0
v0.11.0
What's Changed
- Docs: Update URL to Criteo notebook by @mikemckiernan in #1383
- Update support_matrix.rst by @lgardenhire in #1375
- Support min_val for categorical features in DataGen by @bschifferer in #1369
- Fix null_size logic in Categorify op by @rjzamora in #1386
- Fix CUDA version doc by @albert17 in #1387
- Fixes tests utils imports by @albert17 in #1393
- Exit integration by @albert17 in #1395
- Fix lambdaop call by @jperez999 in #1394
- Add ReduceDtypeSize op by @benfred in #1398
- Fix
remove_inputsusage inexport_pytorch_ensembleby @karlhigley in #1389 - Param to send test results by @albert17 in #1405
- Migrate io, graph, dispatch, worker, and utils to
merlin.coreby @karlhigley in #1384 - Import
DistributedandSerialexecution-manager utilities from merlin-core by @rjzamora in #1380 - Pin
merlin-coreto a specific commit to avoid breaking changes by @karlhigley in #1409 - Rename
merlin.graphtomerlin.dagby @jperez999 in #1411 - Add DropLowCardinality op by @benfred in #1412
- Update
merlin-coretov0.1.1(instead ofmainbranch) by @karlhigley in #1419
New Contributors
- @mikemckiernan made their first contribution in #1383
Full Changelog: v0.10.0...v0.11.0
v0.10.0
What's Changed
- schema metadata propagation by @jperez999 in #1354
- Create
TagSetas a container that resolves conflicts between tags (like continuous and categorical) by @jperez999 in #1360 - Update support_matrix.rst by @lgardenhire in #1363
- Raise an error when the actual dtype produced by an operator doesn't match the schema by @jperez999 in #1362
- Deprecate client from Dataset, Workflow, and DatasetInspector by @rjzamora in #1318
- fixes asv display to one metric per notebook and does not repeat metrics by @jperez999 in #1366
- Keras loader nvt dataset usage by default if available by @jperez999 in #1374
- Fixes hash_crossed with cudf 21.12 by @albert17 in #1376
- Fixes tests by @albert17 in #1377
- Support custom Python operators in the Triton operator/ensemble API by @jperez999 in #1368
- Use new fsspec.parquet module to accelerate reads from remote storage by @rjzamora in #1241
Full Changelog: v0.9.0...v0.10.0
v0.9.0
What's Changed
- Workflow for adding issues to the backlog by @benfred in #1305
- Set the priority and date added fields for new issues. by @benfred in #1308
- Label issues not created by nvidia-merlin members by @benfred in #1309
- moved tf import to after tf config is completed by @jperez999 in #1311
- Fix Triton import for
_convert_string2pytorch_dtypeby @karlhigley in #1312 - Apply NVT graph API/DSL to building Triton ensembles by @jperez999 in #1292
- Fixes tests by @albert17 in #1326
- Activates Blossom CI by @albert17 in #1324
- Add a
compute_input_schemamethod to operators by @jperez999 in #1330 - removed column_types.json from nvtabular by @jperez999 in #1317
- working refit as expected by user by @jperez999 in #1338
- Update support_matrix.rst by @lgardenhire in #1336
- HugeCTR Multihot Training-Inference example by @albert17 in #1329
- Triton setup via merlin graph api by @jperez999 in #1339
- removed parents selector logic in selector setter, by @jperez999 in #1343
- Switch to packaging.version.Version for version checks by @benfred in #1345
- fix for storage name bug in path creation by @jperez999 in #1347
- Fix multiGPU Pytorch MovieLens by @bschifferer in #1319
- Update dead links in Documentation by @SimonCW in #1342
- Fixes cudf 21.10 error by @albert17 in #1350
- Fixes unit tests for containers by @albert17 in #1349
- Create an explicit mapping between Operator input and output columns by @jperez999 in #1348
- Updates notebooks for cudf 21.10 by @albert17 in #1353
- Revert notebook by @albert17 in #1355
- Update conda packages to cudf >= 21.10 and add pynvml by @benfred in #1356
- Fix writing out workflows to S3 by @benfred in #1357
New Contributors
Full Changelog: v0.8.0...v0.9.0
v0.8.0
What's Changed
- Allow writing workflows to cloud storage by @benfred in #1232
- Avoid copy of remote-data buffer in call to read_parquet by @rjzamora in #1239
- Update container references to merlin 21.11 by @benfred in #1242
- Fix numpy version in CI by @karlhigley in #1255
- Modularize the Triton inference model for NVT Workflows by @karlhigley in #1252
- Dl cpu by @jperez999 in #1245
- fixes for schema saving and writing by @jperez999 in #1215
- decouple io from schema by @jperez999 in #1161
- Remove non-exist Torch
uintdtypes from Triton conversion utils by @karlhigley in #1270 - utf-8 when opening notebooks by @albert17 in #1271
- Add 'pad' option for the ListSlice op by @benfred in #1262
- End-to-end Inference support for Transformers4Rec Tensorflow Models by @rnyak in #1256
- fix lookup error on typo in tags for target by @jperez999 in #1281
- Fix resolution of tags to column names when executing
Workflowsby @jperez999 in #1285 - Extract all knowledge of Triton from the serving-time
WorkflowRunnersby @karlhigley in #1257 - Extract an abstract
graphpackage from NVT Workflows by @karlhigley in #1265 - dataset duck typing for dataloader by @jperez999 in #1272
- Reduce device-memory footprint in Categorify fit by @rjzamora in #1259
- Fixes for ListSlice operator with padding by @benfred in #1288
- Update support_matrix.rst by @lgardenhire in #1243
- Fix notebook tests broken by recent graph refactoring by @karlhigley in #1293
- add init file for import support by @jperez999 in #1300
- add missing dependencies to poetry by @benfred in #1298
- Fix inference issues for end-to-end TF example for Transformers4Rec by @karlhigley in #1299
- Uninstall NVT (removing versions from PyPI) before installing NVT in CI by @karlhigley in #1303
- Updates integration tests by @albert17 in #1294
- fix train_test split by @rnyak in #1291
- fix arbitrary output file number bug, shrink number of files and warn… by @jperez999 in #1301
Full Changelog: v0.7.1...v0.8.0
v0.7.1
NVTabular v0.7.1 (2 November 2021)
Improvements
- Add LogOp support for list features #1153
- Add Normalize operator support for list features #1154
- Add DataLoader.epochs() method and Dataset.to_iter(epochs=) argument #1147
- Add ValueCount operator for recording of multihot min and max list lengths #1171
Bug Fixes
- Fix Criteo inference #1198
- Fix performance regressions in Criteo benchmark #1222
- Fix error in JoinGroupby op #1167
- Fix Filter/JoinExternal key error #1143
- Fix LambdaOp transforming dependency values #1185
- Fix reading parquet files with list columns from GCS #1155
- Fix TargetEncoding with dependencies as the target #1165
- Fix Categorify op to calculate unique count stats for Nulls #1159
v0.7.0
NVTabular v0.7.0
Improvements
- Add column tagging API #943
- Export dataset schema when writing out datasets #948
- Make dataloaders aware of schema #947
- Standardize a Workflows representation of its output columns #372
- Add multi-gpu training example using PyTorch Distributed #775
- Speed up reading Parquet files from remote storage like GCS or S3 #1119
- Add utility to convert TFRecord datasets to Parquet #1085
- Add multi-gpu training example using PyTorch Distributed #775
- Add multihot support for PyTorch inference #719
- Add options to reserve categorical indices in the Categorify() op #1074
- Update notebooks to work with CPU only systems #960
- Save output from Categorify op in a single table for HugeCTR #946
- Add a keyset file for HugeCTR integration #1049
Bug Fixes
v0.6.1
v0.6.0
NVTabular v0.6.0
Improvements
- Add CPU support #534
- Speed up inference on Triton Inference Server #744
- Add support for session based recommenders #355
- Add PyTorch Dataloader support for Sparse Tensors #500
- Add ListSlice operator for truncating list columns #734
- Categorical ids sorted by frequency #799
- Add ability to select a subset of a ColumnGroup #809
- Add option to use Rename op to give a single column a new fixed name #825
- Add a 'map' function to KerasSequenceLoader, which enables sample weights #667
- Add JoinExternal option on nvt.Dataset in addition to cudf #370
- Allow passing ColumnGroup to get_embedding_sizes #732
- Add ability to name LambdaOp and provide a better default name in graph visualizations #860
Bug Fixes
- Fix make_feature_column_workflow for Categorical columns #763
- Fix Categorify output dtypes for list columns #963
- Fix inference for Outbrain example #669
- Fix dask metadata after calling workflow.to_ddf() #852
- Fix out of memory errors #896, #971
- Fix normalize output when stdev is zero #993
- Fix using UCX with a dask cluster on Merlin containers #872