.. _nmt-label: Neural Machine Translation ========================== Models ------ +-------------------------------------------------------------------------------------+------+ | Model | Lang | +=====================================================================================+======+ | `Helsinki-NLP/opus-mt-en-fr | EN | | `__ | | +-------------------------------------------------------------------------------------+------+ | `t5-small `__ | EN | +-------------------------------------------------------------------------------------+------+ | `Helsinki-NLP/opus-mt-ru-en | RU | | `__ | | +-------------------------------------------------------------------------------------+------+ | `Helsinki-NLP/opus-mt-ru-es | RU | | `__ | | +-------------------------------------------------------------------------------------+------+ Datasets -------- 1. `enimai/MuST-C-fr `__ 1. **Lang**: EN 2. **Rows**: 2630 3. **Preprocess**: 1. Rename column ``en`` to ``source``. 2. Rename column ``fr`` to ``target``. 3. Delete duplicates in dataset. 4. Reset indexes. 2. `RocioUrquijo/en_de `__ 1. **Lang**: EN 2. **Rows**: 700 3. **Preprocess**: 1. Rename column ``en`` to ``source``. 2. Rename column ``de`` to ``target``. 3. Delete duplicates in dataset. 4. Add prefix *Translate from English to German:* for each ``source`` row. 5. Reset indexes. 3. `shreevigneshs/iwslt-2023-en-ru-train-val-split-0.2 `__ 1. **Lang**: RU 2. **Rows**: 600 3. **Preprocess**: 1. Drop columns ``ru_annotated``, ``styles``. 2. Rename column ``ru`` to ``source``. 3. Rename column ``en`` to ``target``. 4. Reset indexes. 4. `nuvocare/Ted2020_en_es_fr_de_it_ca_pl_ru_nl `__ 1. **Lang**: RU 2. **Rows**: 7210 3. **Preprocess**: 1. Drop columns ``de``, ``en``, ``fr``, ``it``, ``nl``, ``pl``. 2. Rename column ``ru`` to ``source``. 3. Rename column ``es`` to ``target``. 4. Delete empty rows in dataset. 5. Delete duplicates in dataset. 6. Reset indexes. Metrics ------- - BLEU