artefactual.preprocessing.parser#

Module for parsing model outputs from various sources to extract log probabilities. Each format is handled by a dedicated parser function, defined in their respective modules.

Functions

parse_sampled_token_logprobs(outputs)

A wrapper function to parse token probabilities from various output formats.

parse_top_logprobs(outputs)

Parse different output formats to extract logprobs.

artefactual.preprocessing.parser.parse_sampled_token_logprobs(outputs)[source]#

A wrapper function to parse token probabilities from various output formats. First checks for vLLM format, then OpenAI ChatCompletion, and finally OpenAI Responses API.

Return type:

list[ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]]

Args:

outputs: Model outputs in various formats.

Returns:
list[NDArray]: A list of 1D numpy arrays, each containing the log probabilities

of the sampled tokens for one sequence.

artefactual.preprocessing.parser.parse_top_logprobs(outputs)[source]#

Parse different output formats to extract logprobs.

Return type:

list[dict[int, list[float]]]

Args:
outputs: Model outputs. Can be:
  • List of vLLM RequestOutput objects.

  • OpenAI ChatCompletion object (or dict).

  • OpenAI Responses object (or dict).

Returns:

List of dictionaries mapping token indices to lists of log probs.

Raises:

TypeError: If the output format is not supported.