artefactual.preprocessing.parser#

Module for parsing model outputs from various sources to extract log probabilities. Each format is handled by a dedicated parser function, defined in their respective modules.

Functions

`parse_sampled_token_logprobs`(outputs)	A wrapper function to parse token probabilities from various output formats.
`parse_top_logprobs`(outputs)	Parse different output formats to extract logprobs.

artefactual.preprocessing.parser.parse_sampled_token_logprobs(outputs)[source]#

A wrapper function to parse token probabilities from various output formats. First checks for vLLM format, then OpenAI ChatCompletion, and finally OpenAI Responses API.

Return type:: list[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]]

Args:

outputs: Model outputs in various formats.

Returns:

list[NDArray]: A list of 1D numpy arrays, each containing the log probabilities: of the sampled tokens for one sequence.

artefactual.preprocessing.parser.parse_top_logprobs(outputs)[source]#

Parse different output formats to extract logprobs.

Return type:: list[dict[int, list[float]]]

Args:

outputs: Model outputs. Can be:

List of vLLM RequestOutput objects.
OpenAI ChatCompletion object (or dict).
OpenAI Responses object (or dict).

Returns:

List of dictionaries mapping token indices to lists of log probs.

Raises:

TypeError: If the output format is not supported.

artefactual.preprocessing.parser#

This Page