Summarization
Models
Model |
Lang |
|---|---|
mrm8488/bert-mini2bert-mini-finetuned-cnn_daily_mail-summarization |
EN |
EN |
|
mrm8488/bert-small2bert-small-finetuned-cnn_daily_mail-summarization |
EN |
RU |
|
RU |
|
RU |
Datasets
-
Lang: EN
Rows: 973
Preprocess:
Rename column
reporttosource.Rename column
summarytotarget.Reset indexes.
-
Lang: EN
Rows: 11490
Preprocess:
Select
1.0.0subset.Drop columns
id.Rename column
articletosource.Rename column
highlightstotarget.Delete duplicates in dataset.
Remove substring
(CNN)for eachsourcerow.Reset indexes.
tomasg25/scientific_lay_summarisation
Lang: EN
Rows: 1376
Preprocess:
Select
plossubset.Drop columns
section_headings,keywords,title,year.Rename column
articletosource.Rename column
summarytotarget.Reset indexes.
-
Lang: EN
Rows: 6658
Preprocess:
Rename column
articletosource.Rename column
abstracttotarget.Reset indexes.
-
Lang: RU
Rows: 6793
Preprocess:
Drop columns
title,date,url.Rename column
texttosource.Rename column
summarytotarget.Reset indexes.
-
Lang: RU
Rows: 30454
Preprocess:
Select
trainsplit.Drop columns
title,date,url.Rename column
article_contenttosource.Rename column
summarytotarget.Reset indexes.
-
Lang: RU
Rows: 7609
Preprocess:
Rename column
infotosource.Rename column
summarytotarget.Reset indexes.
-
Lang: RU
Rows: 95
Preprocess:
Select
trainsplit.Rename column
Reviewstosource.Rename column
Summarytotarget.Reset indexes.
Supervised Fine-Tuning (SFT) Parameters
Note
Set the parameter
target_modules=["query", "key", "value", "dense"] for the
mrm8488/bert-small2bert-small-finetuned-cnn_daily_mail-summarization,
mrm8488/bert-mini2bert-mini-finetuned-cnn_daily_mail-summarization and
dmitry-vorobiev/rubert_ria_headlines models.
Note
Set the parameter learning_rate=1e-4 for the
mrm8488/bert-mini2bert-mini-finetuned-cnn_daily_mail-summarization model and
learning_rate=1e-1 for the
dmitry-vorobiev/rubert_ria_headlines
as SFT parameter.
Metrics
BLEU
ROUGE
Note
Use the rougeL metric and set seed=77 parameter
when loading the rouge metric.