Classification
Models
Model |
Lang |
|---|---|
EN |
|
RU |
|
RU |
|
EN |
|
EN |
|
EN |
|
RU |
|
RU |
|
RU |
|
RU |
Datasets
-
Lang: EN
Rows: 31915
Preprocess:
Drop column
id.Rename column
labeltotarget.Rename column
comment_texttosource.Reset indexes.
-
Lang: RU
Rows: 5430
Preprocess:
Select
simplifiedsubset.Drop columns
idandtext.Convert column
labelsto tuple.Rename columns
labelstotarget.Rename column
ru_texttosource.Group emotions and change numbers to words.
Delete duplicates in
target.Clean column
source.Reset indexes.
papluca/language-identification
Lang: EN
Rows: 10000
Preprocess:
Rename column
labelstotarget.Rename column
texttosource.Map language abbreviation to label classes.
Reset indexes.
-
Lang: EN
Rows: 7600
Preprocess:
Rename column
labeltotarget.Rename column
texttosource.Reset indexes.
-
Lang: EN
Rows: 25000
Preprocess:
Select
testsplit.Rename column
labelstotarget.Rename column
texttosource.Reset indexes.
-
Lang: EN
Rows: 2000
Preprocess:
Select
splitsubset.Select
validationsplit.Rename column
labeltotarget.Rename column
texttosource.Reset indexes.
-
Lang: RU
Rows: 36591
Preprocess:
Select
validationsplit.Leave only
contentandgrade3columns.Rename column
grade3totarget.Rename column
contenttosource.Delete empty rows in dataset.
Map
targetwith class labels.Reset indexes.
blinoff/healthcare_facilities_reviews
Lang: RU
Rows: 70597
Preprocess:
Select
validationsplit.Leave only
contentandsentimentcolumns.Rename column
sentimenttotarget.Rename column
contenttosource.Map
targetwith class labels.
Note
In combination with a multiclass model blanchefort/rubert-base-cased-sentiment-rusentiment
it is necessary to bring the neutral class to the negative class at the prediction stage.
tatiana-merz/cyrillic_turkic_langs
Lang: RU
Rows: 9000
Preprocess:
Select
validationsplit.Rename column
labeltotarget.Rename column
texttosource.Map
targetwith class labels.
Lang: RU
Rows: 6350
Preprocess:
Rename column
toxictotarget.Rename column
neutraltosource.Delete duplicates in dataset.
Map
targetwith class labels.Reset indexes.
Lang: RU
Rows: 163187
Preprocess:
Select
trainsplit.Rename column
toxictotarget.Rename column
texttosource.
Lang: RU
Rows: 20900
Preprocess:
Rename column
reasonstotarget.Rename column
toxic_commenttosource.Rename
{"toxic_content":true}label to1and{"not_toxic":true}label to0.Remove irrelevant rows in dataset.
Delete duplicates in dataset.
Reset indexes.
Lang: EN
Rows: 6490
Preprocess:
Select
validationsplit.Drop column
id,severe_toxic,obscene,threat,insult,identity_hate.Rename column
toxictotarget.Rename column
comment_texttosource.Reset indexes.
Lang: EN
Rows: 26507
Preprocess:
Select
trainsplit.Rename column
toxictotarget.Rename column
commenttosource.Reset indexes.
Metrics
F1-score