NLI

Models

Model

Lang

cointegrated/rubert-base-cased-nli-threeway

RU

cointegrated/rubert-tiny-bilingual-nli

RU

cross-encoder/qnli-distilroberta-base

EN

MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli

EN

Datasets

  1. cointegrated/nli-rus-translated-v2021

    1. Lang: RU

    2. Rows: 19647

    3. Preprocess:

      1. Select dev split.

      2. Filter the dataset by the column source with the value mnli.

        1. This step you should implement in lab_7_llm.main.RawDataImporter.obtain().

      3. Leave only columns premise_ru, hypothesis_ru and label.

      4. Rename column premise_ru to premise.

      5. Rename column hypothesis_ru to hypothesis.

      6. Rename column label to target.

      7. Delete empty rows in dataset.

      8. Delete duplicates in dataset.

      9. Map target with class labels.

      10. Reset indexes.

  2. XNLI

    1. Lang: RU

    2. Rows: 2490

    3. Preprocess:

      1. Select ru subset.

      2. Select validation split.

      3. Rename column label to target.

      4. Delete duplicates in dataset.

      5. Delete empty rows in dataset.

      6. Reset indexes.

  3. GLUE QNLI

    1. Lang: EN

    2. Rows: 5463

    3. Preprocess:

      1. Select qnli subset.

      2. Select validation split.

      3. Rename column question to premise.

      4. Rename column sentence to hypothesis.

      5. Rename column label to target.

      6. Delete duplicates in dataset.

      7. Delete empty rows in dataset.

      8. Map target with class labels.

      9. Reset indexes.

  4. GLUE MNLI

    1. Lang: EN

    2. Rows: 9815

    3. Preprocess:

      1. Select mnli subset.

      2. Select validation_matched split.

      3. Rename column label to target.

      4. Delete duplicates in dataset.

      5. Delete empty rows in dataset.

      6. Reset indexes.

Supervised Fine-Tuning (SFT) Parameters

Note

Set the parameter target_modules=["key"] for the cointegrated/rubert-base-cased-nli-threeway model.

Metrics

  • Accuracy