artefactual.preprocessing#

artefactual.preprocessing.is_openai_responses_api(outputs)[source]#

Detects if the output follows the signature of the new OpenAI Responses API.

Return type:: bool

Args:: outputs (Any): The output object or dictionary to inspect.
Returns:: bool: True if the output matches the OpenAI Responses API signature, False otherwise.

artefactual.preprocessing.parse_sampled_token_logprobs(outputs)[source]#

A wrapper function to parse token probabilities from various output formats. First checks for vLLM format, then OpenAI ChatCompletion, and finally OpenAI Responses API.

Return type:: list[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Args:

outputs: Model outputs in various formats.

Returns:

list[NDArray]: A list of 1D numpy arrays, each containing the log probabilities: of the sampled tokens for one sequence.

artefactual.preprocessing.parse_top_logprobs(outputs)[source]#

Parse different output formats to extract logprobs.

Return type:: list[dict[int, list[float]]]

Args:

outputs: Model outputs. Can be:

List of vLLM RequestOutput objects.
OpenAI ChatCompletion object (or dict).
OpenAI Responses object (or dict).

Returns:

List of dictionaries mapping token indices to lists of log probs.

Raises:

TypeError: If the output format is not supported.

artefactual.preprocessing.process_openai_chat_completion(response, iterations)[source]#

Processes top log probabilities from OpenAI Chat Completion (classic ‘choices’ format).

Return type:: list[dict[int, list[float]]]

Args:: response (Any): The response object or dictionary from OpenAI API (ChatCompletion). iterations (int): The number of iterations (choices) to process.
Returns:: list[dict[int, list[float]]]: A list of dictionaries, where each dictionary maps token indices to lists of log probabilities for a sequence.

artefactual.preprocessing.process_openai_responses_api(response)[source]#

Parses the response from the ‘client.responses.create’ API to extract the top log probabilities.

Structure expected: response.output -> [item] -> item.content -> [part] -> part.logprobs

Return type:: list[dict[int, list[float]]]

Args:: response (Any): The response object from the OpenAI Responses API.
Returns:: list[dict[int, list[float]]]: A list of dictionaries, where each dictionary maps token indices to lists of log probabilities for a sequence.

artefactual.preprocessing.process_vllm_top_logprobs(outputs, iterations)[source]#

Processes log probabilities from vllm LLM.generate (or chat) outputs for a given number of iterations.

Return type:: list[dict[int, list[float]]]

Args:: outputs (list[RequestOutput]): A list containing model output objects, each with log probability data. iterations (int): The number of iterations to process, corresponding to the number of output sequences.
Returns:: list[dict[int, list[float]]]: A list of dictionaries mapping token indices to lists of log probs for each token in the sequence.

Modules

`openai_parser`
`parser`	Module for parsing model outputs from various sources to extract log probabilities.
`vllm_parser`

artefactual.preprocessing#

This Page