artefactual.preprocessing.parser#
Module for parsing model outputs from various sources to extract log probabilities. Each format is handled by a dedicated parser function, defined in their respective modules.
Functions
|
A wrapper function to parse token probabilities from various output formats. |
|
Parse different output formats to extract logprobs. |
- artefactual.preprocessing.parser.parse_sampled_token_logprobs(outputs)[source]#
A wrapper function to parse token probabilities from various output formats. First checks for vLLM format, then OpenAI ChatCompletion, and finally OpenAI Responses API.
- Return type:
list[ndarray[tuple[int,...],dtype[TypeVar(_ScalarType_co, bound=generic, covariant=True)]]]
- Args:
outputs: Model outputs in various formats.
- Returns:
- list[NDArray]: A list of 1D numpy arrays, each containing the log probabilities
of the sampled tokens for one sequence.
- artefactual.preprocessing.parser.parse_top_logprobs(outputs)[source]#
Parse different output formats to extract logprobs.
- Args:
- outputs: Model outputs. Can be:
List of vLLM RequestOutput objects.
OpenAI ChatCompletion object (or dict).
OpenAI Responses object (or dict).
- Returns:
List of dictionaries mapping token indices to lists of log probs.
- Raises:
TypeError: If the output format is not supported.