core_utils package

Submodules

Module with description of abstract LLM pipeline.

class core_utils.llm.llm_pipeline.AbstractLLMPipeline(model_name: str, dataset: Dataset, max_length: int, batch_size: int, device: str = 'cpu')

Bases: ABC

Abstract LLM Pipeline.

__init__(model_name: str, dataset: Dataset, max_length: int, batch_size: int, device: str = 'cpu') → None

Initialize an instance of AbstractLLMPipeline.

Parameters:

model_name (str) – The name of the pre-trained model.
dataset (torch.utils.data.dataset.Dataset) – The dataset used.
max_length (int) – The maximum length of generated sequence.
batch_size (int) – The size of the batch inside DataLoader.
device (str) – The device for inference.

_abc_impl = <_abc._abc_data object>

_model: HFModelLike | None: Model

abstract analyze_model() → dict

Analyze model computing properties.

Returns:: Properties of a model
Return type:: dict

abstract infer_dataset() → DataFrame

Infer model on a whole dataset.

Returns:: Data with predictions.
Return type:: pandas.DataFrame

abstract infer_sample(sample: tuple[str, ...]) → str | None

Infer model on a single sample.

Parameters:: sample (tuple[str, ...]) – The given sample for inference with model
Returns:: A prediction
Return type:: str | None

class core_utils.llm.llm_pipeline.HFModelLike(*args, **kwargs)

Bases: Protocol

Protocol definition of HF models.

__init__(*args, **kwargs)

_abc_impl = <_abc._abc_data object>

_is_protocol = True

to(device: str) → Any

Placeholder to claim HF models are transferred to a specific device.

Parameters:: device (str) – Device name
Returns:: Custom value
Return type:: Any

Metrics enum.

class core_utils.llm.metrics.Metrics(value)

Bases: Enum

Metrics enum.

ACCURACY = 'accuracy'

BLEU = 'bleu'

F1 = 'f1'

PRECISION = 'precision'

RECALL = 'recall'

ROUGE = 'rouge'

SQUAD = 'squad'

Module with description of abstract data importer.

class core_utils.llm.raw_data_importer.AbstractRawDataImporter(hf_name: str | None)

Bases: ABC

Abstract Raw Data Importer.

__init__(hf_name: str | None) → None

Initialize an instance of AbstractRawDataImporter.

Parameters:: hf_name (str | None) – Name of the HuggingFace dataset

_abc_impl = <_abc._abc_data object>

_local_path: Path | None: A path to dataset

_raw_data: DataFrame | None: A dataset in a table format

abstract obtain() → None: Download a dataset.

property raw_data: DataFrame | None

Property for original dataset in a table format.

Returns:: A dataset in a table format
Return type:: pandas.DataFrame | None

Module with description of abstract raw data preprocessor.

class core_utils.llm.raw_data_preprocessor.AbstractRawDataPreprocessor(raw_data: DataFrame)

Bases: ABC

Abstract Raw Data Preprocessor.

__init__(raw_data: DataFrame) → None

Initialize an instance of AbstractRawDataPreprocessor.

Parameters:: raw_data (pandas.DataFrame) – Original dataset in a table format

_abc_impl = <_abc._abc_data object>

_data: DataFrame | None: Preprocessed dataset in a table format

abstract analyze() → dict

Analyze a dataset.

Returns:: Dataset key properties
Return type:: dict

property data: DataFrame | None

Property for preprocessed dataset.

Returns:: Preprocessed dataset in a table format
Return type:: pandas.DataFrame | None

abstract transform() → None: Apply preprocessing transformations to the raw dataset.

class core_utils.llm.raw_data_preprocessor.ColumnNames(value)

Bases: Enum

Column names for preprocessed DataFrame.

CONTEXT = 'context'

HYPOTHESIS = 'hypothesis'

PREDICTION = 'predictions'

PREMISE = 'premise'

QUESTION = 'question'

SOURCE = 'source'

TARGET = 'target'

Module with description of abstract task evaluator.

class core_utils.llm.task_evaluator.AbstractTaskEvaluator(metrics: Iterable[Metrics])

Bases: ABC

Abstract Task Evaluator.

__init__(metrics: Iterable[Metrics]) → None

Initialize an instance of AbstractTaskEvaluator.

Parameters:: metrics (Iterable[Metrics]) – List of metrics to check

_abc_impl = <_abc._abc_data object>

abstract run() → dict | None

Evaluate the predictions against the references using the specified metric.

Returns:: A dictionary containing information about the calculated metric
Return type:: dict | None