artefactual.preprocessing#

artefactual.preprocessing.is_openai_responses_api(outputs)[source]#

Detects if the output follows the signature of the new OpenAI Responses API.

Return type:

bool

Args:

outputs (Any): The output object or dictionary to inspect.

Returns:

bool: True if the output matches the OpenAI Responses API signature, False otherwise.

artefactual.preprocessing.parse_model_outputs(outputs)[source]#

Parse different output formats to extract logprobs.

Return type:

list[dict[int, list[float]]]

Args:
outputs: Model outputs. Can be:
  • List of vLLM RequestOutput objects.

  • OpenAI ChatCompletion object (or dict).

  • OpenAI Responses object (or dict).

Returns:

List of dictionaries mapping token indices to lists of log probs.

Raises:

TypeError: If the output format is not supported.

artefactual.preprocessing.process_openai_chat_completion(response, iterations)[source]#

Processes log probabilities from OpenAI Chat Completion (classic ‘choices’ format).

Return type:

list[dict[int, list[float]]]

Args:

response (Any): The response object or dictionary from OpenAI API (ChatCompletion). iterations (int): The number of iterations (choices) to process.

Returns:

list[dict[int, list[float]]]: A list of dictionaries, where each dictionary maps token indices to lists of log probabilities for a sequence.

artefactual.preprocessing.process_openai_responses_api(response)[source]#

Parses the response from the ‘client.responses.create’ API to extract log probabilities.

Structure expected: response.output -> [item] -> item.content -> [part] -> part.logprobs

Return type:

list[dict[int, list[float]]]

Args:

response (Any): The response object from the OpenAI Responses API.

Returns:

list[dict[int, list[float]]]: A list of dictionaries, where each dictionary maps token indices to lists of log probabilities for a sequence.

artefactual.preprocessing.process_vllm_logprobs(outputs, iterations)[source]#

Processes log probabilities from vllm.chat outputs for a given number of iterations.

Return type:

list[dict[int, list[float]]]

Args:

outputs (list[RequestOutput]): A list containing model output objects, each with log probability data. iterations (int): The number of iterations to process, corresponding to the number of output sequences.

Returns:

list[dict[int, list[float]]]: A list of dictionaries mapping token indices to lists of log probs for each token in the sequence.

Modules

openai_parser

parser

Module for parsing model outputs from various sources to extract log probabilities.

vllm_parser