.. _nli-label: NLI === Models ------ +-------------------------------------------------------------------+------+ | Model | Lang | +===================================================================+======+ | `cointegrated/rubert-base-cased-nli-threeway `__ | | +-------------------------------------------------------------------+------+ | `cointegrated/rubert-tiny-bilingual-nli | RU | | `__ | | +-------------------------------------------------------------------+------+ | `cross-encoder/qnli-distilroberta-base | EN | | `__ | | +-------------------------------------------------------------------+------+ | `MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli `__ | | +-------------------------------------------------------------------+------+ Datasets -------- 1. `cointegrated/nli-rus-translated-v2021 `__ 1. **Lang**: RU 2. **Rows**: 19647 3. **Preprocess**: 1. Select ``dev`` split. 2. Filter the dataset by the column ``source`` with the value ``mnli``. 1. This step you should implement in :py:meth:`stubs.labs.lab_7_llm.main.RawDataImporter.obtain`. 3. Leave only columns ``premise_ru``, ``hypothesis_ru`` and ``label``. 4. Rename column ``premise_ru`` to ``premise``. 5. Rename column ``hypothesis_ru`` to ``hypothesis``. 6. Rename column ``label`` to ``target``. 7. Delete empty rows in dataset. 8. Delete duplicates in dataset. 9. Map ``target`` with class labels. 10. Reset indexes. 2. `Russian Super GLUE TERRA `__ 1. **Lang**: RU 2. **Rows**: 307 3. **Preprocess**: 1. Select ``terra`` subset. 2. Rename column ``label`` to ``target``. 3. Delete duplicates in dataset. 4. Delete empty rows in dataset. 5. Reset indexes. 3. `XNLI `__ 1. **Lang**: RU 2. **Rows**: 2490 3. **Preprocess**: 1. Select ``ru`` subset. 2. Rename column ``label`` to ``target``. 3. Delete duplicates in dataset. 4. Delete empty rows in dataset. 5. Reset indexes. 4. `GLUE QNLI `__ 1. **Lang**: EN 2. **Rows**: 5463 3. **Preprocess**: 1. Select ``qnli`` subset. 2. Rename column ``question`` to ``premise``. 3. Rename column ``sentence`` to ``hypothesis``. 4. Rename column ``label`` to ``target``. 5. Delete duplicates in dataset. 6. Delete empty rows in dataset. 7. Map ``target`` with class labels. 8. Reset indexes. 5. `GLUE MNLI `__ 1. **Lang**: EN 2. **Rows**: 9815 3. **Preprocess**: 1. Select ``mnli`` subset. 2. Rename column ``label`` to ``target``. 3. Delete duplicates in dataset. 4. Delete empty rows in dataset. 5. Reset indexes. Supervised Fine-Tuning (SFT) Parameters --------------------------------------- .. note:: Set the parameter ``learning_rate=1e-2`` for the `cointegrated/rubert-tiny-bilingual-nli `__ model as SFT parameter. Metrics ------- - Accuracy