This repository was archived by the owner on Jan 5, 2023. It is now read-only.
Releases: lium-lst/nmtpytorch
Releases · lium-lst/nmtpytorch
v4.0.0
This release supports Pytorch >= 0.4.1 including the recent 1.0 release. The relevant
setup.py and environment.yml files will default to 1.0.0 installation.
v4.0.0 (18/12/2018)
- Critical:
NumpyDatasetnow returns tensors of shapeHxW, N, Cfor 3D/4D convolutional features,1, N, Cfor 2D feature files. Models should be adjusted to adapt to this new shaping. - An
order_fileper split (ord: path/to/txt file with integer per line) can be given from the configurations to change the feature order of numpy tensors to flexibly revert, shuffle, tile, etc. them. - Better dimension checking to ensure that everything is OK.
- Added
LabelDatasetfor single label input/outputs with associatedVocabularyfor integer mapping. - Added
handle_oom=(True|False)argument for[train]section to recover from GPU out-of-memory (OOM) errors during training. This is disabled by default, you need to enable it from the experiment configuration file. Note that it is still possible to get an OOM during validation perplexity computation. If you hit that, reduce theeval_batch_sizeparameter. - Added
de-hyphenpost-processing filter to stitch back the aggressive hyphen splitting of Moses during early-stopping evaluations. - Added optional projection layer and layer normalization to
TextEncoder. - Added
enc_lnorm, sched_samplingoptions toNMTto enable layer normalization for encoder and use scheduled sampling at a given probability. ConditionalDecodercan now be initialized with max-pooled encoder states or the last state as well.- You can now experiment with different decoders for
NMTby changing thedec_variantoption. - Collect all attention weights in
self.historydictionary of the decoders. - Added n-best output to
nmtpy translatewith the argument-N. - Changed the way
-Sworks fornmtpy translate. Now you need to give the split name with-sall the time but-Sis used to override the input data sources defined for that split in the configuration file. - Removed decoder-initialized multimodal NMT
MNMTDecInit. Same functionality exists within theNMTmodel by using the model optiondec_init=feats. - New model MultimodalNMT: that supports encoder initialization, decoder initialization, both, concatenation of embeddings with visual features, prepending and appending. This model covers almost all the models from LIUM-CVC's WMT17 multimodal systems except the multiplicative interaction variants such as
trgmul. - New model MultimodalASR: encoder-decoder initialized ASR model. See the paper
- New Model AttentiveCaptioning: Similar but not an exact reproduction of show-attend-and-tell, it uses feature files instead of raw images.
- New model AttentiveMNMTFeaturesFA: LIUM-CVC's WMT18 multimodal system i.e. filtered attention
- New (experimental) model NLI: A simple LSTM-based NLI baseline for SNLI dataset:
directionshould be defined asdirection: pre:Text, hyp:Text -> lb:Labelpre, hypandlbkeys point to plain text files with one sentence per line. A vocabulary should be constructed even for the labels to fit the nmtpy architecture.accshould be added toeval_metricsto compute accuracy.
v2.0.0
- Ability to install through
pip. - Advanced layers are now organized into subfolders.
- New basic layers: Convolution over sequence, MaxMargin.
- New attention layers: Co-attention, multi-head attention, hierarchical attention.
- New encoders: Arbitrary sequence-of-vectors encoder, BiLSTMp speech feature encoder.
- New decoders: Multi-source decoder, switching decoder, vector decoder.
- New datasets: Kaldi dataset (.ark/.scp reader), Shelve dataset, Numpy sequence dataset.
- Added learning rate annealing: See
lr_decay*options inconfig.py. - Removed subword-nmt and METEOR files from repository. We now depend on
the PIP package for subword-nmt. For METEOR,nmtpy-install-extrashould
be launched after installation. - More multi-task and multi-input/output
translateandtrainingregimes. - New early-stopping metrics: Character and word error rate (cer,wer) and ROUGE (rouge).
- Curriculum learning option for the
BucketBatchSampler, i.e. length-ordered batches. - New models:
- ASR: Listen-attend-and-spell like automatic speech recognition
- Multitask*: Experimental multi-tasking & scheduling between many inputs/outputs.
v1.4.0
- Add different
environment.ymlfiles for easy installation usingconda. You can now
create a ready-to-usecondaenvironment by just callingconda env create -f environment-cuda<VER>.yml. - Make
NumpyDatasetmemory efficient by keepingfloat16arrays as they are
until batch creation time. - Rename
Multi30kRawDatasettoMulti30kDatasetwhich now supports both
raw image files and pre-extracted visual features file stored as.npy. - Add CNN feature extraction script under
scripts/. - Add doubly stochastic attention to
ShowAttendAndTelland multimodal NMT. - New model
MNMTDecinitto initialize decoder with auxiliary features. - New model
AMNMTFeatureswhich is the attentive MMT but with features file
instead of end-to-end feature extraction which was memory hungry.
v1.3.2
Updates for ShowAttendAndTell model.
v1.3.1
- Removed old
Multi30kDataset. - Sort batches by source sequence length instead of target.
- Fix
ShowAttendAndTellmodel. It should now work.
v1.3.0
- Added
Multi30kRawDatasetfor training end-to-end systems from raw images as input. - Added
NumpyDatasetto read.npy/.npztensor files as input features. - You can now pass
-Stonmtpy trainto produce shorter experiment files with not all the hyperparameters in file name. - New post-processing filter option
de-spmfor Google SentencePiece (SPM) processed files. sacrebleuis now a dependency as it is now accepted as an early-stopping metric.
It only makes sense to use it with SPM processed files since they are detokenized
once post-processed.- Added
sklearnas a dependency for some metrics. - Added
momentumandnesterovparameters to[train]section for SGD. ImageEncoderlayer is improved in many ways. Please see the code for further details.- Added unmerged upstream PR for
ModuleDict()support. METEORwill now fallback to English if language can not be detected from file suffixes.-fnow produces a separate numpy file for token frequencies when building vocabulary files withnmtpy-build-vocab.- Added new command
nmtpy testfor non beam-search inference modes. - Removed
nmtpy resumecommand and addedpretrained_fileoption for[train]to initialize model weights from a checkpoint. - Added
freeze_layersoption for[train]to give comma-separated list of layer name prefixes to freeze. - Improved seeding: seed is now printed in order to reproduce the results.
- Added IPython notebook for attention visualization.
- Layers
- New shallow
SimpleGRUDecoderlayer. TextEncoder: Ability to setmaxnormandgradscaleof embeddings and work with or without sorted-length batches.ConditionalDecoder: Make it work with GRU/LSTM, allow settingmaxnorm/gradscalefor embeddings.ConditionalMMDecoder: Same as above.
- New shallow
- nmtpy translate
--avoid-doubleand--avoid-unkremoved for now.- Added Google's length penalty normalization switch
--lp-alpha. - Added ensembling which is enabled automatically if you give more than 1 model checkpoints.
- New machine learning metric wrappers in
utils/ml_metrics.py:- Label-ranking average precision
lrap - Coverage error
- Mean reciprocal rank
- Label-ranking average precision
Release v1.2.0
Release Notes
- You can now use
$HOMEand$USERin your configuration files. - Fixed an overflow error that would cause NMT with more than 255 tokens to fail.
- METEOR worker process is now correctly killed after validations.
- Many runs of an experiment are now suffixed with a unique random string instead of incremental integers to avoid race conditions in cluster setups.
- Replaced
utils.nn.get_network_topology()with a newTopologyclass that will parse thedirectionstring of the model in a more smart way. - If
CUDA_VISIBLE_DEVICESis set, theGPUManagerwill always honor it. - Dropped creation of temporary/advisory lock files under
/tmpfor GPU reservation. - Time measurements during training are now structered into batch overhead, training and evaluation timings.
- Datasets
- Added
TextDatasetfor standalone text file reading. - Added
OneHotDataset, a variant ofTextDatasetwhere the sequences are not prefixed/suffixed with<bos>and<eos>respectively. - Added experimental
MultiParallelDatasetthat merges an arbitrary number of parallel datasets together.
- Added
- nmtpy translate
.nodbland.nounksuffixes are now added to output files for--avoid-doubleand--avoid-unkarguments respectively.- A model-agnostic enough
beam_search()is now separated out into its own filenmtpytorch/search.py. max_lendefault is increased to 200.
v1.1
v1.1 (25/01/2018)
- New experimental
Multi30kDatasetandImageFolderDatasetclasses torchvisiondependency added for CNN supportnmtpy-coco-metricsnow computes one METEOR withoutnorm=True- Mainloop mechanism is completely refactored with backward-incompatible
configuration option changes for[train]section:patience_deltaoption is removed- Added
eval_batch_sizeto define batch size for GPU beam-search during training eval_freqdefault is now3000which means per3000minibatcheseval_metricsnow defaults toloss. As before, you can provide a list
of metrics likebleu,meteor,lossto compute all of them and early-stop
based on the first- Added
eval_zero (default: False)which tells to evaluate the model
once on dev set right before the training starts. Useful for sanity
checking if you fine-tune a model initialized with pre-trained weights - Removed
save_best_n: we no longer save the bestNmodels on dev set
w.r.t. early-stopping metric - Added
save_best_metrics (default: True)which will save best models
on dev set w.r.t each metric provided ineval_metrics. This kind of
remedies the removal ofsave_best_n checkpoint_freqnow to defaults to5000which means per5000
minibatches.- Added
n_checkpoints (default: 5)to define the number of last
checkpoints that will be kept ifcheckpoint_freq > 0i.e. checkpointing enabled
- Added
ExtendedInterpolationsupport to configuration files:- You can now define intermediate variables in
.conffiles to avoid
typing same paths again and again. A variable can be referenced
from within its section usingtensorboard_dir: ${save_path}/tbnotation
Cross-section references are also possible:${data:root}will be replaced
by the value of therootvariable defined in the[data]section.
- You can now define intermediate variables in
- Added
-p/--pretrainedtonmtpy trainto initialize the weights of
the model using another checkpoint.ckpt. - Improved input/output handling for
nmtpy translate:-saccepts a comma-separated test sets defined in the configuration
file of the experiment to translate them at once. Example:-s val,newstest2016,newstest2017- The mutually exclusive counterpart of
-sis-Swhich receives a
single input file of source sentences. - For both cases, an output prefix should now be provided with
-o.
In the case of multiple test sets, the output prefix will be appended
the name of the test set and the beam size. If you just provide a single file with-S
the final output name will only reflect the beam size information.
- Two new arguments for
nmtpy-build-vocab:-f: Stores frequency counts as well inside the finaljsonvocabulary-x: Does not add special markers<eos>,<bos>,<unk>,<pad>into the vocabulary
Layers/Architectures
- Added
Fusion()layer toconcat,sum,mulan arbitrary number of inputs - Added experimental
ImageEncoder()layer to seamlessly plug a VGG or ResNet
CNN usingtorchvisionpretrained models Attentionlayer arguments improved. You can now select the bottleneck
dimensionality for MLP attention withatt_bottleneck. Thedot
attention is still not tested and probably broken.
New stuff
- Added AttentiveMNMT which implements modality-specific multimodal attention
from the paper Multimodal Attention for Neural Machine Translation - Added ShowAttendAndTell model
Changes in NMT
dec_initdefaults tomean_ctx, i.e. the decoder will be initialized
with the mean context computed from the source encoderenc_lnormwhich was just a placeholder is now removed since we do not
provided layer-normalization for now- Beam Search is completely moved to GPU