Classification
Models
Model |
Lang |
---|---|
EN |
|
RU |
|
RU |
|
EN |
|
EN |
|
EN |
|
RU |
|
RU |
|
RU |
|
RU |
Datasets
-
Lang: EN
Rows: 31915
Preprocess:
Drop column
id
.Rename column
label
totarget
.Rename column
comment_text
tosource
.Reset indexes.
-
Lang: RU
Rows: 5430
Preprocess:
Select
simplified
subset.Drop columns
id
andtext
.Convert column
labels
to tuple.Rename columns
labels
totarget
.Rename column
ru_text
tosource
.Group emotions and change numbers to words.
Delete duplicates in
target
.Clean column
source
.Reset indexes.
papluca/language-identification
Lang: EN
Rows: 10000
Preprocess:
Rename column
labels
totarget
.Rename column
text
tosource
.Map language abbreviation to label classes.
Reset indexes.
-
Lang: EN
Rows: 7600
Preprocess:
Rename column
label
totarget
.Rename column
text
tosource
.Reset indexes.
-
Lang: EN
Rows: 25000
Preprocess:
Select
test
split.Rename column
labels
totarget
.Rename column
text
tosource
.Reset indexes.
-
Lang: EN
Rows: 2000
Preprocess:
Select
split
subset.Select
validation
split.Rename column
label
totarget
.Rename column
text
tosource
.Reset indexes.
-
Lang: RU
Rows: 36591
Preprocess:
Select
validation
split.Leave only
content
andgrade3
columns.Rename column
grade3
totarget
.Rename column
content
tosource
.Delete empty rows in dataset.
Map
target
with class labels.Reset indexes.
blinoff/healthcare_facilities_reviews
Lang: RU
Rows: 70597
Preprocess:
Select
validation
split.Leave only
content
andsentiment
columns.Rename column
sentiment
totarget
.Rename column
content
tosource
.Map
target
with class labels.
Note
In combination with a multiclass model blanchefort/rubert-base-cased-sentiment-rusentiment
it is necessary to bring the neutral
class to the negative
class at the prediction stage.
tatiana-merz/cyrillic_turkic_langs
Lang: RU
Rows: 9000
Preprocess:
Select
validation
split.Rename column
label
totarget
.Rename column
text
tosource
.Map
target
with class labels.
Lang: RU
Rows: 6350
Preprocess:
Rename column
toxic
totarget
.Rename column
neutral
tosource
.Delete duplicates in dataset.
Map
target
with class labels.Reset indexes.
Lang: RU
Rows: 163187
Preprocess:
Select
train
split.Rename column
toxic
totarget
.Rename column
text
tosource
.
Lang: RU
Rows: 20900
Preprocess:
Rename column
reasons
totarget
.Rename column
toxic_comment
tosource
.Rename
{"toxic_content":true}
label to1
and{"not_toxic":true}
label to0
.Remove irrelevant rows in dataset.
Delete duplicates in dataset.
Reset indexes.
Lang: EN
Rows: 6490
Preprocess:
Select
validation
split.Drop column
id
,severe_toxic
,obscene
,threat
,insult
,identity_hate
.Rename column
toxic
totarget
.Rename column
comment_text
tosource
.Reset indexes.
Lang: EN
Rows: 26507
Preprocess:
Select
train
split.Rename column
toxic
totarget
.Rename column
comment
tosource
.Reset indexes.
Metrics
F1-score