core_utils package

Submodules

Module with description of abstract LLM pipeline.

class core_utils.llm.llm_pipeline.AbstractLLMPipeline(model_name: str, dataset: Dataset, max_length: int, batch_size: int, device: str = 'cpu')

Bases: ABC

Abstract LLM Pipeline.

__init__(model_name: str, dataset: Dataset, max_length: int, batch_size: int, device: str = 'cpu') None

Initialize an instance of AbstractLLMPipeline.

Parameters:
  • model_name (str) – The name of the pre-trained model.

  • dataset (torch.utils.data.dataset.Dataset) – The dataset used.

  • max_length (int) – The maximum length of generated sequence.

  • batch_size (int) – The size of the batch inside DataLoader.

  • device (str) – The device for inference.

_abc_impl = <_abc._abc_data object>
_model: HFModelLike | None

Model

abstract analyze_model() dict

Analyze model computing properties.

Returns:

Properties of a model

Return type:

dict

abstract infer_dataset() DataFrame

Infer model on a whole dataset.

Returns:

Data with predictions.

Return type:

pandas.DataFrame

abstract infer_sample(sample: tuple[str, ...]) str | None

Infer model on a single sample.

Parameters:

sample (tuple[str, ...]) – The given sample for inference with model

Returns:

A prediction

Return type:

str | None

class core_utils.llm.llm_pipeline.HFModelLike(*args, **kwargs)

Bases: Protocol

Protocol definition of HF models.

__init__(*args, **kwargs)
_abc_impl = <_abc._abc_data object>
_is_protocol = True
to(device: str) Any

Placeholder to claim HF models are transferred to a specific device.

Parameters:

device (str) – Device name

Returns:

Custom value

Return type:

Any

Metrics enum.

class core_utils.llm.metrics.Metrics(value)

Bases: Enum

Metrics enum.

ACCURACY = 'accuracy'
BLEU = 'bleu'
F1 = 'f1'
PRECISION = 'precision'
RECALL = 'recall'
ROUGE = 'rouge'
SQUAD = 'squad'

Module with description of abstract data importer.

class core_utils.llm.raw_data_importer.AbstractRawDataImporter(hf_name: str | None)

Bases: ABC

Abstract Raw Data Importer.

__init__(hf_name: str | None) None

Initialize an instance of AbstractRawDataImporter.

Parameters:

hf_name (str | None) – Name of the HuggingFace dataset

_abc_impl = <_abc._abc_data object>
_local_path: Path | None

A path to dataset

_raw_data: DataFrame | None

A dataset in a table format

abstract obtain() None

Download a dataset.

property raw_data: DataFrame | None

Property for original dataset in a table format.

Returns:

A dataset in a table format

Return type:

pandas.DataFrame | None

Module with description of abstract raw data preprocessor.

class core_utils.llm.raw_data_preprocessor.AbstractRawDataPreprocessor(raw_data: DataFrame)

Bases: ABC

Abstract Raw Data Preprocessor.

__init__(raw_data: DataFrame) None

Initialize an instance of AbstractRawDataPreprocessor.

Parameters:

raw_data (pandas.DataFrame) – Original dataset in a table format

_abc_impl = <_abc._abc_data object>
_data: DataFrame | None

Preprocessed dataset in a table format

abstract analyze() dict

Analyze a dataset.

Returns:

Dataset key properties

Return type:

dict

property data: DataFrame | None

Property for preprocessed dataset.

Returns:

Preprocessed dataset in a table format

Return type:

pandas.DataFrame | None

abstract transform() None

Apply preprocessing transformations to the raw dataset.

class core_utils.llm.raw_data_preprocessor.ColumnNames(value)

Bases: Enum

Column names for preprocessed DataFrame.

CONTEXT = 'context'
HYPOTHESIS = 'hypothesis'
PREDICTION = 'predictions'
PREMISE = 'premise'
QUESTION = 'question'
SOURCE = 'source'
TARGET = 'target'

Module with description of abstract task evaluator.

class core_utils.llm.task_evaluator.AbstractTaskEvaluator(metrics: Iterable[Metrics])

Bases: ABC

Abstract Task Evaluator.

__init__(metrics: Iterable[Metrics]) None

Initialize an instance of AbstractTaskEvaluator.

Parameters:

metrics (Iterable[Metrics]) – List of metrics to check

_abc_impl = <_abc._abc_data object>
abstract run() dict | None

Evaluate the predictions against the references using the specified metric.

Returns:

A dictionary containing information about the calculated metric

Return type:

dict | None