Laboratory work №8. Supervised Fine-Tuning (SFT) Large Language Models

Full API

Implementation tactics

Stage 0. Start working with laboratory work

Start your implementation by selecting a new combination of model and dataset you are going to use for fine-tuning. You can find all available combinations in the table.

Important

For laboratory work №8, you need to select another task, namely, if there was a generation (Generation, Summarization, NMT) task, then you need to select classification (Detection, NLI) and vice versa.

Important

You have to open new Pull Request to implement Laboratory Work №8.

Note

All logic for instantiating and using needed abstractions should be implemented in a main() function of a start.py module.

To do this, implement the functions in the main.py module in lab_8_sft folder and import them into start.py module in lab_8_sft folder.

if __name__ == '__main__':
    main()

Note

You need to set the desired mark: 4, 6, 8 or 10 in the target_score field in the settings.json file. The higher the desired mark, the more number of tests run when checking your Pull Request.

Python competencies required to complete this tutorial:

working with Transformers models;
working with HuggingFace datasets;
working with LoRA PEFT method;
estimating result using metric;
making server for the chosen task using FastAPI.

Motivation and purpose

In this laboratory work, we will explore both inference and fine-tuning of Large Language Models (LLMs), with a particular focus on Supervised Fine-Tuning (SFT) using Parameter-Efficient Fine-Tuning (PEFT) techniques, specifically LoRA (Low-Rank Adaptation).

As previously discussed, the lifecycle of LLMs consists of several phases, with two key stages being:

Training - the phase in which the model learns from a large labeled dataset, adjusting its internal parameters (weights and biases) to recognize patterns and relationships in the data. Training requires extensive computational resources and vast amounts of data.

Inference - the phase in which the pre-trained model is used to generate predictions on new, unseen data without further modifications to its internal parameters. Inference is computationally less expensive and focuses on efficient, real-time responses.

However, in many real-world applications, pre-trained models do not always generalize well to specific domains or tasks. To bridge this gap, fine-tuning is employed. Fine-tuning allows us to adapt a general LLM to a specialized task by continuing its training on a smaller, task-specific dataset.

Full fine-tuning of LLMs is computationally expensive, requiring access to large-scale hardware resources. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA (Low-Rank Adaptation), provide an alternative approach by modifying only a small subset of the model’s parameters, making the process more memory- and compute-efficient. LoRA achieves this by introducing low-rank adaptation matrices to the model’s pre-trained layers, significantly reducing the number of trainable parameters while still maintaining high performance.

The primary purpose of this laboratory work is to:

Understand the difference between inference and fine-tuning in LLMs.

Learn how to fine-tune LLMs efficiently using LoRA within the PEFT framework.

Apply fine-tuned models to various NLP tasks such as Generation, Summarization, Classification, and Machine Translation.

Compare the performance of a pre-trained model versus a fine-tuned model on specific tasks.

By the end of this laboratory work, you will gain practical experience in applying LoRA-based fine-tuning to adapt LLMs for specific tasks while optimizing for efficiency.

Stage 1. Infer one sample from dataset and demonstrate the result

Important

Stages 1 - 4.3 from Laboratory work №7. Large Language Models no. 1 are required to get the mark 4.

Stage 2. Inference of model and demonstrate the result

Important

Stages 4.4 - 5.2 from Laboratory work №7. Large Language Models no. 1 are required to get the mark 6.

Stage 3. Tokenize one sample from dataset

Before fine-tuning a model, it is important to properly prepare the data. Since the data is presented as text, it must be tokenized (i.e. converted into a numeric representation) to prepare it for transfer to the model for fine-tuning.

Implement lab_8_sft.main.tokenize_sample() function, which tokenizes the sample and truncates it to its maximum length.

Set the following parameters for tokenizer:

padding="max_length";

truncation=True;

max_length=120.

Method should return a dictionary with the input_ids, attention_mask and labels for current sample as keys. Such return values provide the necessary information to feed into the model, ensuring the correct fine-tuning process.

Important

It is necessary to have such keys (input_ids, attention_mask and labels) of the returned dictionary, since the transformers library has built-in data processing mechanisms that expect exactly these names.

Important

For Seq2Seq models, it is necessary to tokenize not only sample from the source column, but also from the target column.

Stage 4. Introduce dataset abstraction: `TokenizedTaskDataset`

As in the previous laboratory work to interact with the model we will use PyTorch Dataset abstraction. We convert pd.DataFrame to Dataset and override some methods, because in the next step we will use Transformers Trainer abstraction, which uses PyTorch DataLoader internally to efficiently load the data into the model’s memory, process it in batches and pass it to the model.

Implement lab_8_sft.main.TokenizedTaskDataset abstraction, which allows to prepare data for fine-tuning.

This class inherits from torch.utils.data.Dataset abstraction, which has one internal attribute:

self._data - pd.DataFrame with preprocessed data.

Fill the attribute self._data with tokenized samples from the data. Use the function lab_8_sft.main.tokenize_sample().

So, this class allows to combine pd.DataFrame and PyTorch Dataset, tokenize text in the required format for the model, ensure efficient data loading during fine-tuning and allows Trainer to load data in batches for tuning.

Important

When instantiating TokenizedTaskDataset abstraction in start.py module, limit the full pd.DataFrame you got from RawDataPreprocessor to the number of samples, calculating it for training using the batch and the number of training steps. Take the next samples after the ones you used for inference, namely starting with sample 10.

See the intended instantiation:

num_samples = 10
fine_tune_samples = batch * fine_tuning_steps
dataset = TokenizedTaskDataset(preprocessor.data.loc[
        num_samples : num_samples + fine_tune_samples
    ])

where preprocessor.data is the property of the RawDataPreprocessor class.

Stage 4.1. Get the dataset length

In the next two steps, we will override some methods that will allow us to further tune the model.

Implement lab_8_sft.main.TokenizedTaskDataset.__len__() method which allows to get the number of items in dataset. PyTorch DataLoader uses this method to determine the total number of batches.

Stage 4.2. Retrieve an item from the dataset

Implement lab_8_sft.main.TokenizedTaskDataset.__getitem__() method which allows to retrieve an item from the dataset by index.

PyTorch DataLoader calls this method to retrieve data for each batch. Implementing this method allows you to define how the data is retrieved from the dataset and how it is structured. It should return a dictionary that contains the result of tokenizing one sample from the dataset by index.

Note

For example, if the data at index 0 contains the sample i feel bitchy but not defeated yet, then lab_8_sft.main.TokenizedTaskDataset.__getitem__() method will output the following value: {'input_ids': tensor([...]), 'attention_mask': tensor([...]), 'labels': 3}

Stage 5. Introduce SFT Pipeline: `SFTPipeline`

To fine-tune the selected model, you need to implement the lab_8_sft.main.SFTPipeline abstraction.

This class inherits from core_utils.llm.sft_pipeline.AbstractSFTPipeline, which provides a structure for initializing a model and performing fine-tuning.

The class has the following internal attributes:

self._lora_config – configuration for LoRA;

self._model – a pre-trained model.

Note

When configuring LoRAConfig, set the following parameters: r=4, lora_alpha=8, lora_dropout=0.1 and target_module from SFT parameters.

See the intended instantiation:

pipeline = SFTPipeline(settings.parameters.model, dataset, sft_params)

where:

settings.parameters.model is the name of the pre-trained model;

dataset is an instance of TaskDataset abstraction;

sft_params contains the fine-tuning parameters.

Stage 5.1. Model fine-tuning

Implement method lab_8_sft.main.SFTPipeline.run(), which allows to fine-tune a pre-trained model using the LoRA method.

Before starting fine-tuning, set up the training parameters using the TrainingArguments. Define parameters such as max_steps, per_device_train_batch_size, learning_rate, save_strategy, use_cpu, load_best_model_at_end to control the training and optimization process.

To train the model, use Trainer, which takes the model, training arguments, and dataset as input.

Note

Initialize the model with LoRA adapters using get_peft_model() from PEFT.

Important

After fine-tuning process merge LoRA-adapted weights and then save the fine-tuned model to the specified output directory, the path to which you can get from core_utils.llm.sft_pipeline.AbstractSFTPipeline class.

Stage 5.2. Demonstrate the result in `start.py`

Important

Stages 3 - 5.2 are required to get the mark 8.

Demonstrate fine-tuning process and fine-tuned model performance evaluation in the main() function of the start.py module.

So, the pipeline should include the following stages:

preparation of the dataset for fine-tuning;

fine-tuning of the model;

analysis of the fine-tuned model;

inference of the fine-tuned model;

evaluation of the quality of the fine-tuned model.

Set the following parameters:

Inference parameters: num_samples=10, max_length=120 and batch_size=64.

SFT parameters: batch_size=3, max_length=120, max_fine_tuning_steps=50 and learning_rate=1e-3.

Important

You can find all needed specific values for parameters for your combination of model and dataset choosing appropriate task:

Classification

Generation

NLI

Neural Machine Translation

Summarization

Important

To infer the fine-tuned model you need to save it to config.lab_settings.SFTParams.finetuned_model_path

Note

After model inference you have to save you predictions to dist/predictions.csv file in start.py.

Stage 6. Implement model as a service and demonstrate the result

Important

Stages 6 from Laboratory work №7. Large Language Models no. 1 are required to get the mark 10.

An example of start page might look like this:

Important

You need to add a checkbox that is responsible for which model’s result will be output as an answer. If the Use base model option is enabled, use a pretrained model, otherwise use a fine-tuned one.

Laboratory work №8. Supervised Fine-Tuning (SFT) Large Language Models

Implementation tactics

Stage 0. Start working with laboratory work

Motivation and purpose

Stage 1. Infer one sample from dataset and demonstrate the result

Stage 2. Inference of model and demonstrate the result

Stage 3. Tokenize one sample from dataset

Stage 4. Introduce dataset abstraction: TokenizedTaskDataset

Stage 4.1. Get the dataset length

Stage 4.2. Retrieve an item from the dataset

Stage 5. Introduce SFT Pipeline: SFTPipeline

Stage 5.1. Model fine-tuning

Stage 5.2. Demonstrate the result in start.py

Stage 6. Implement model as a service and demonstrate the result

Stage 4. Introduce dataset abstraction: `TokenizedTaskDataset`

Stage 5. Introduce SFT Pipeline: `SFTPipeline`

Stage 5.2. Demonstrate the result in `start.py`