artefactual.scoring#

Scoring module for artefactual library.

This module re-exports commonly used scoring classes and helpers so they can be imported from artefactual.scoring instead of deep submodules.

class artefactual.scoring.EPR(pretrained_model_name_or_path=None, k=15)[source]#

Bases: LogProbUncertaintyDetector

Computes Entropy Production Rate (EPR) from model log probabilities. EPR quantifies uncertainty based on the entropy of the model’s predicted token distributions. It calculates the entropy contributions of the top K predicted tokens at each position and averages these contributions over the sequence to produce a sequence-level uncertainty score. You can parse raw model outputs using the parse_top_logprobs method from artefactual.preprocessing.

__init__(pretrained_model_name_or_path=None, k=15)[source]#

Initialize the EPR scorer.

Args:

pretrained_model_name_or_path: Model name or path to load calibration coefficients.: If not provided, the scorer returns raw uncalibrated scores and issues a warning.

k: Number of top log probabilities to consider (default: 15).

Raises:

ValueError: If calibration cannot be loaded from the provided valid path.

compute(parsed_logprobs)[source]#

Compute EPR-based uncertainty scores from parsed log probabilities. You can parse raw model outputs using the parse_top_logprobs method from artefactual.preprocessing.

Return type:: list[float]

Args:: parsed_logprobs: Parsed log probabilities.
Returns:: List of sequence-level EPR scores.

compute_token_scores(parsed_logprobs)[source]#

Compute token-level EPR scores from parsed logprobs. You can parse raw model outputs using the parse_top_logprobs method from artefactual.preprocessing.

Return type:: list[ndarray[tuple[Any, ...], dtype[floating]]]

Args:: parsed_logprobs: Parsed log probabilities.
Returns:: List of token-level EPR scores (numpy arrays).

class artefactual.scoring.LogProbUncertaintyDetector(k=15)[source]#

Bases: UncertaintyDetector

A base class for uncertainty detection methods based on logprobs.

__init__(k=15)[source]#

Initialize the uncertainty detector.

Args:

k: Number of top log probabilities to consider per token.: Must be positive. Default is 15.

Raises:

ValueError: If k is not positive

abstract compute_token_scores(inputs)[source]#

Compute token-level uncertainty scores from inputs.

Return type:: list[ndarray[tuple[Any, ...], dtype[floating]]]

Args:: inputs: The inputs to process (e.g. completions or model outputs).
Returns:: The computed token-level scores.

class artefactual.scoring.SentenceProbabilityScorer[source]#

Bases: SentenceProbabilityDetector

Computes sentence-level probability from the sampled tokens log probabilities. This method aggregates token-level log probabilities into a single score for the entire sequence. You can parse raw model outputs using the parse_sampled_token_logprobs method from artefactual.preprocessing.

compute(inputs)[source]#

Compute sentence-level probability scores by summing token log probabilities. Empty sequences are treated as out-of-domain inputs and mapped to a 0.0 fallback.

Return type:: list[float]

Args:: inputs: A list of 1D numpy arrays of token log probabilities, one per sequence.
Returns:: The whole sentence probability for each sequence.

compute_token_scores(inputs)[source]#

Returns the sampled token probabilities (exponentiating token logprobs).

Return type:: list[ndarray[tuple[Any, ...], dtype[floating]]]

Args:: inputs: A list of token log probabilities for each token in the sequence.
Returns:: A list of numpy arrays of token-level probabilities.

class artefactual.scoring.UncertaintyDetector[source]#

Bases: ABC

A base class for uncertainty detection methods.

abstract compute(inputs)[source]#

Compute sequence-level uncertainty scores from inputs.

Return type:: list[float]

Args:: inputs: The inputs to process (e.g. completions or model outputs).
Returns:: The computed sequence-level scores.

class artefactual.scoring.WEPR(pretrained_model_name_or_path)[source]#

Bases: LogProbUncertaintyDetector

Computes Weighted Entropy Production Rate (WEPR) from model log probabilities. WEPR extends EPR by applying learned weights to the entropy contributions based on their ranks. It computes both mean-weighted and max-weighted contributions to produce a sequence-level uncertainty score. Token-level WEPR scores are also provided. You can parse raw model outputs using the parse_top_logprobs method from artefactual.preprocessing.

__init__(pretrained_model_name_or_path)[source]#

Initialize the WEPR scorer with weights loaded from the specified source.

Args:: pretrained_model_name_or_path: Either a built-in model name or a local file path to load weights from.

compute(parsed_logprobs)[source]#

Compute WEPR-based uncertainty scores from parsed log probabilities. You can parse raw model outputs using the parse_top_logprobs method from artefactual.preprocessing.

Return type:: list[float]

Args:: parsed_logprobs: Parsed log probabilities.
Returns:: List of sequence-level WEPR scores.

compute_token_scores(parsed_logprobs)[source]#

Compute token-level WEPR scores from parsed logprobs. You can parse raw model outputs using the parse_top_logprobs method from artefactual.preprocessing.

Return type:: list[ndarray[tuple[Any, ...], dtype[floating]]]

Args:: parsed_logprobs: Parsed log probabilities.
Returns:: List of token-level WEPR scores (numpy arrays).

artefactual.scoring.compute_entropy_contributions(logprobs, k)[source]#

Return type:: ndarray[tuple[Any, ...], dtype[floating]]

Compute entropic contributions s_kj = -p_k log(p_k) for top-K logprobs using vectorized operations. Args:

logprobs: A 2D array of shape (num_tokens, num_logprobs) containing log probabilities. k: Number of top log probabilities to consider per token.

Returns:: A 2D array of shape (num_tokens, K) containing entropy contributions.

Modules

`entropy_methods`	Entropy methods package exports.
`hallucination_detector`
`uncertainty_detector`

artefactual.scoring#

This Page