Marian v1.10.0

emjotde released this 06 Feb 23:55

· 304 commits to master since this release

[1.10.0] - 2021-02-06

Added

Added intgemm8(ssse3|avx|avx512)?, intgemm16(sse2|avx|avx512)? types to marian-conv with uses intgemm backend. Types intgemm8 and intgemm16 are hardware-agnostic, the other ones hardware-specific.
Shortlist is now always multiple-of-eight.
Added intgemm 8/16bit integer binary architecture agnostic format.
Add --train-embedder-rank for fine-tuning any encoder(-decoder) model for multi-lingual similarity via softmax-margin loss
Add --logical-epoch that allows to redefine the displayed epoch counter as a multiple of n data epochs, updates or labels. Also allows to define width of fractional part with second argument.
Add --metrics chrf for computing ChrF according to https://www.aclweb.org/anthology/W15-3049/ and SacreBLEU reference implementation
Add --after option which is meant to replace --after-batches and --after-epochs and can take label based criteria
Add --transformer-postprocess-top option to enable correctly normalized prenorm behavior
Add --task transformer-base-prenorm and --task transformer-big-prenorm
Turing and Ampere GPU optimisation support, if the CUDA version supports it.
Printing word-level scores in marian-scorer
Optimize LayerNormalization on CPU by 6x through vectorization (ffast-math) and fixing performance regression introduced with strides in 77a420
Decoding multi-source models in marian-server with --tsv
GitHub workflows on Ubuntu, Windows, and MacOS
LSH indexing to replace short list
ONNX support for transformer models
Add topk operator like PyTorch's topk
Use cblas_sgemm_batch instead of a for loop of cblas_sgemm on CPU as the batched_gemm implementation
Supporting relative paths in shortlist and sqlite options
Training and scoring from STDIN
Support for reading from TSV files from STDIN and other sources during training
and translation with options --tsv and --tsv-fields n.
Internal optional parameter in n-best list generation that skips empty hypotheses.
Quantized training (fixed point or log-based quantization) with --quantize-bits N command
Support for using Apple Accelerate as the BLAS library

Fixed

Segfault of spm_train when compiled with -DUSE_STATIC_LIBS=ON seems to have gone away with update to newer SentencePiece version.
Fix bug causing certain reductions into scalars to be 0 on the GPU backend. Removed unnecessary warp shuffle instructions.
Do not apply dropout in embeddings layers during inference with dropout-src/trg
Print "server is listening on port" message after it is accepting connections
Fix compilation without BLAS installed
Providing a single value to vector-like options using the equals sign, e.g. --models=model.npz
Fix quiet-translation in marian-server
CMake-based compilation on Windows
Fix minor issues with compilation on MacOS
Fix warnings in Windows MSVC builds using CMake
Fix building server with Boost 1.72
Make mini-batch scaling depend on mini-batch-words and not on mini-batch-words-ref
In concatenation make sure that we do not multiply 0 with nan (which results in nan)
Change Approx.epsilon(0.01) to Approx.margin(0.001) in unit tests. Tolerance is now
absolute and not relative. We assumed incorrectly that epsilon is absolute tolerance.
Fixed bug in finding .git/logs/HEAD when Marian is a submodule in another project.
Properly record cmake variables in the cmake build directory instead of the source tree.
Added default "none" for option shuffle in BatchGenerator, so that it works in executables where shuffle is not an option.
Added a few missing header files in shortlist.h and beam_search.h.
Improved handling for receiving SIGTERM during training. By default, SIGTERM triggers 'save (now) and exit'. Prior to this fix, batch pre-fetching did not check for this sigal, potentially delaying exit considerably. It now pays attention to that. Also, the default behaviour of save-and-exit can now be disabled on the command line with --sigterm exit-immediately.
Fix the runtime failures for FASTOPT on 32-bit builds (wasm just happens to be 32-bit) because it uses hashing with an inconsistent mix of uint64_t and size_t.

Changed

Remove --clip-gemm which is obsolete and was never used anyway
Removed --optimize switch, instead we now determine compute type based on binary model.
Updated SentencePiece repository to version 8336bbd0c1cfba02a879afe625bf1ddaf7cd93c5 from https://github.com/google/sentencepiece.
Enabled compilation of SentencePiece by default since no dependency on protobuf anymore.
Changed default value of --sentencepiece-max-lines from 10000000 to 2000000 since apparently the new version doesn't sample automatically anymore (Not quite clear how that affects quality of the vocabulary).
Change mini-batch-fit search stopping criterion to stop at ideal binary search threshold.
--metric bleu now always detokenizes SacreBLEU-style if a vocabulary knows how to, use bleu-segmented to compute BLEU on word ids. bleu-detok is now a synonym for bleu.
Move label-smoothing computation into Cross-entropy node
Move Simple-WebSocket-Server to submodule
Python scripts start with #!/usr/bin/env python3 instead of python
Changed compile flags -Ofast to -O3 and remove --ffinite-math
Moved old graph groups to depracated folder
Make cublas and cusparse handle inits lazy to save memory when unused
Replaced exception-based implementation for type determination in FastOpt::makeScalar

Assets 2