artefactual.calibration.outputs_entropy#

Generate a dataset with entropy scores for model outputs.

This script generates responses from a language model for a given dataset and computes entropy-based uncertainty metrics for each generation.

Uses VLLM API to generate outputs and process log probabilities.

Functions

generate_entropy_dataset(input_path, ...)

Generate a dataset with entropy scores for model outputs.

Classes

GenerationConfig(**data)

Configuration for entropy dataset generation.

class artefactual.calibration.outputs_entropy.GenerationConfig(**data)[source]#

Bases: BaseModel

Configuration for entropy dataset generation.

iterations: int#
log_to_file: bool#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_path: str#
n_queries: int#
number_logprobs: int#
temperature: float#
top_k_sampling: int#
artefactual.calibration.outputs_entropy.generate_entropy_dataset(input_path, output_path, config)[source]#

Generate a dataset with entropy scores for model outputs.

Return type:

Path

Args:

input_path (str | Path): Path to the input dataset. output_path (str | Path): Path to save the output dataset. config (GenerationConfig): Configuration parameters.

Returns:

Path: The path to the generated output file.