artefactual.data#

Data loading and processing utilities for artefactual.

class artefactual.data.Completion(**data)[source]#

Bases: BaseModel

Represents a single generated completion as a sequence of token logprobs.

Attributes:

token_logprobs: Mapping from token position to top-K logprobs.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

token_logprobs: dict[int, list[float]]#
class artefactual.data.Result(**data)[source]#

Bases: BaseModel

Represents the full data for a single query.

Attributes:

query_id: The unique identifier for the query. query: The query text. expected_answers: List of expected correct answers. generated_answers: List of generated answers with metadata. token_logprobs: Nested sequence of token log probabilities.

expected_answers: Sequence[str]#
generated_answers: Sequence[dict[str, str]]#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str#
query_id: str#
token_logprobs: Sequence[Sequence[Sequence[float]]]#
class artefactual.data.TokenLogprob(**data)[source]#

Bases: BaseModel

Represents a single token’s log probability.

Attributes:

token: The token string. logprob: The log probability of the token. rank: The rank of the token in the probability distribution.

logprob: float#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

rank: int#
token: str#

Modules

data_model

Pydantic models for representing the data in the generated JSON files.