Skip to content

Releases: NVIDIA-Merlin/NVTabular

v1.0.0

06 Apr 19:49
e151b01

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.11.0...v1.0.0

v0.11.0

01 Mar 22:06
4da878b

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.10.0...v0.11.0

v0.10.0

02 Feb 17:06
525a1ed

Choose a tag to compare

What's Changed

  • schema metadata propagation by @jperez999 in #1354
  • Create TagSet as a container that resolves conflicts between tags (like continuous and categorical) by @jperez999 in #1360
  • Update support_matrix.rst by @lgardenhire in #1363
  • Raise an error when the actual dtype produced by an operator doesn't match the schema by @jperez999 in #1362
  • Deprecate client from Dataset, Workflow, and DatasetInspector by @rjzamora in #1318
  • fixes asv display to one metric per notebook and does not repeat metrics by @jperez999 in #1366
  • Keras loader nvt dataset usage by default if available by @jperez999 in #1374
  • Fixes hash_crossed with cudf 21.12 by @albert17 in #1376
  • Fixes tests by @albert17 in #1377
  • Support custom Python operators in the Triton operator/ensemble API by @jperez999 in #1368
  • Use new fsspec.parquet module to accelerate reads from remote storage by @rjzamora in #1241

Full Changelog: v0.9.0...v0.10.0

v0.9.0

11 Jan 23:54
9077681

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.8.0...v0.9.0

v0.8.0

07 Dec 23:29

Choose a tag to compare

What's Changed

Full Changelog: v0.7.1...v0.8.0

v0.7.1

04 Nov 01:54
21c0f7a

Choose a tag to compare

NVTabular v0.7.1 (2 November 2021)

Improvements

  • Add LogOp support for list features #1153
  • Add Normalize operator support for list features #1154
  • Add DataLoader.epochs() method and Dataset.to_iter(epochs=) argument #1147
  • Add ValueCount operator for recording of multihot min and max list lengths #1171

Bug Fixes

  • Fix Criteo inference #1198
  • Fix performance regressions in Criteo benchmark #1222
  • Fix error in JoinGroupby op #1167
  • Fix Filter/JoinExternal key error #1143
  • Fix LambdaOp transforming dependency values #1185
  • Fix reading parquet files with list columns from GCS #1155
  • Fix TargetEncoding with dependencies as the target #1165
  • Fix Categorify op to calculate unique count stats for Nulls #1159

v0.7.0

24 Sep 03:45
b55c57c

Choose a tag to compare

NVTabular v0.7.0

Improvements

  • Add column tagging API #943
  • Export dataset schema when writing out datasets #948
  • Make dataloaders aware of schema #947
  • Standardize a Workflows representation of its output columns #372
  • Add multi-gpu training example using PyTorch Distributed #775
  • Speed up reading Parquet files from remote storage like GCS or S3 #1119
  • Add utility to convert TFRecord datasets to Parquet #1085
  • Add multi-gpu training example using PyTorch Distributed #775
  • Add multihot support for PyTorch inference #719
  • Add options to reserve categorical indices in the Categorify() op #1074
  • Update notebooks to work with CPU only systems #960
  • Save output from Categorify op in a single table for HugeCTR #946
  • Add a keyset file for HugeCTR integration #1049

Bug Fixes

  • Fix category counts written out by the Categorify op #1128
  • Fix HugeCTR inference example #1130
  • Fix make_feature_column_workflow bug in Categorify if features have vocabularies of varying size. #1062
  • Fix TargetEncoding op on CPU only systems #976
  • Fix writing empty partitions to Parquet files #1097

v0.6.1

11 Aug 21:01

Choose a tag to compare

NVTabular v0.6.1

Bug Fixes

  • Fix installing package via pip #1030
  • Fix inference with groupby operator #1019
  • Install tqdm with conda package #1030
  • Fix workflow output_dtypes with empty partitions #1028

v0.6.0

03 Aug 18:44
886d5b8

Choose a tag to compare

NVTabular v0.6.0

Improvements

  • Add CPU support #534
  • Speed up inference on Triton Inference Server #744
  • Add support for session based recommenders #355
  • Add PyTorch Dataloader support for Sparse Tensors #500
  • Add ListSlice operator for truncating list columns #734
  • Categorical ids sorted by frequency #799
  • Add ability to select a subset of a ColumnGroup #809
  • Add option to use Rename op to give a single column a new fixed name #825
  • Add a 'map' function to KerasSequenceLoader, which enables sample weights #667
  • Add JoinExternal option on nvt.Dataset in addition to cudf #370
  • Allow passing ColumnGroup to get_embedding_sizes #732
  • Add ability to name LambdaOp and provide a better default name in graph visualizations #860

Bug Fixes

  • Fix make_feature_column_workflow for Categorical columns #763
  • Fix Categorify output dtypes for list columns #963
  • Fix inference for Outbrain example #669
  • Fix dask metadata after calling workflow.to_ddf() #852
  • Fix out of memory errors #896, #971
  • Fix normalize output when stdev is zero #993
  • Fix using UCX with a dask cluster on Merlin containers #872

v0.5.3

26 May 18:21

Choose a tag to compare

Bug Fixes

  • Fix Shuffling in Torch DataLoader #818
  • Fix "Unsupported type_id conversion" in triton inference for string columns #813
  • Fix HugeCTR inference backend Merlin#8