Neural Machine Translation
Models
Model |
Lang |
---|---|
EN |
|
EN |
|
RU |
|
RU |
Datasets
-
Lang: EN
Rows: 2630
Preprocess:
Rename column
en
tosource
.Rename column
fr
totarget
.Delete duplicates in dataset.
Reset indexes.
-
Lang: EN
Rows: 700
Preprocess:
Rename column
en
tosource
.Rename column
de
totarget
.Delete duplicates in dataset.
Add prefix Translate from English to German: for each
source
row.Reset indexes.
shreevigneshs/iwslt-2023-en-ru-train-val-split-0.2
Lang: RU
Rows: 600
Preprocess:
Drop columns
ru_annotated
,styles
.Rename column
ru
tosource
.Rename column
en
totarget
.Reset indexes.
nuvocare/Ted2020_en_es_fr_de_it_ca_pl_ru_nl
Lang: RU
Rows: 7210
Preprocess:
Drop columns
de
,en
,fr
,it
,nl
,pl
.Rename column
ru
tosource
.Rename column
es
totarget
.Delete empty rows in dataset.
Delete duplicates in dataset.
Reset indexes.
Supervised Fine-Tuning (SFT) Parameters
Note
Set the parameter
target_modules=["k_proj", "v_proj", "q_proj", "out_proj"]
for the
Helsinki-NLP/opus-mt-en-fr,
Helsinki-NLP/opus-mt-ru-en,
Helsinki-NLP/opus-mt-ru-es
models.
Note
Set the parameter learning_rate=1e-4
for the
Helsinki-NLP/opus-mt-ru-es
model as SFT parameter.
Metrics
BLEU