Neural Machine Translation

Models

Model

Lang

Helsinki-NLP/opus-mt-en-fr

EN

t5-small

EN

Helsinki-NLP/opus-mt-ru-en

RU

Helsinki-NLP/opus-mt-ru-es

RU

Datasets

  1. enimai/MuST-C-fr

    1. Lang: EN

    2. Rows: 2630

    3. Preprocess:

      1. Rename column en to source.

      2. Rename column fr to target.

      3. Delete duplicates in dataset.

      4. Reset indexes.

  2. RocioUrquijo/en_de

    1. Lang: EN

    2. Rows: 700

    3. Preprocess:

      1. Rename column en to source.

      2. Rename column de to target.

      3. Delete duplicates in dataset.

      4. Add prefix Translate from English to German: for each source row.

      5. Reset indexes.

  3. shreevigneshs/iwslt-2023-en-ru-train-val-split-0.2

    1. Lang: RU

    2. Rows: 600

    3. Preprocess:

      1. Drop columns ru_annotated, styles.

      2. Rename column ru to source.

      3. Rename column en to target.

      4. Reset indexes.

  4. nuvocare/Ted2020_en_es_fr_de_it_ca_pl_ru_nl

    1. Lang: RU

    2. Rows: 7210

    3. Preprocess:

      1. Drop columns de, en, fr, it, nl, pl.

      2. Rename column ru to source.

      3. Rename column es to target.

      4. Delete empty rows in dataset.

      5. Delete duplicates in dataset.

      6. Reset indexes.

Supervised Fine-Tuning (SFT) Parameters

Note

Set the parameter target_modules=["k_proj", "v_proj", "q_proj", "out_proj"] for the Helsinki-NLP/opus-mt-en-fr, Helsinki-NLP/opus-mt-ru-en, Helsinki-NLP/opus-mt-ru-es models.

Note

Set the parameter learning_rate=1e-4 for the Helsinki-NLP/opus-mt-ru-es model as SFT parameter.

Metrics

  • BLEU