artefactual.data.data_model#

Pydantic models for representing the data in the generated JSON files.

Classes

`Completion`(**data)	Represents a single generated completion as a sequence of token logprobs.
`Dataset`(**data)	Represents the entire dataset with metadata and results.
`Result`(**data)	Represents the full data for a single query.
`TokenLogprob`(**data)	Represents a single token's log probability.

class artefactual.data.data_model.Completion(**data)[source]#

Bases: BaseModel

Represents a single generated completion as a sequence of token logprobs.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class artefactual.data.data_model.Dataset(**data)[source]#

Bases: BaseModel

Represents the entire dataset with metadata and results.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class artefactual.data.data_model.Result(**data)[source]#

Bases: BaseModel

Represents the full data for a single query.

Attributes:: query_id: The unique identifier for the query. query: The query text. expected_answers: List of expected correct answers. generated_answers: List of generated answers with metadata. token_logprobs: Nested sequence of token log probabilities.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class artefactual.data.data_model.TokenLogprob(**data)[source]#

Bases: BaseModel

Represents a single token’s log probability.

Attributes:: token: The token string. logprob: The log probability of the token. rank: The rank of the token in the probability distribution.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].