artefactual.calibration.rates_answers#

Functions

rate_answers(config)

Rate answers generated by a model using a judge LLM.

Classes

RatingConfig(**data)

Configuration for answer rating.

ResultItem(**data)

Model for a single result item in the input JSON.

class artefactual.calibration.rates_answers.RatingConfig(**data)[source]#

Bases: BaseModel

Configuration for answer rating.

input_file: str | Path#
judge_model_path: str#
max_tokens: int#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

seed: int#
temperature: float#
class artefactual.calibration.rates_answers.ResultItem(**data)[source]#

Bases: BaseModel

Model for a single result item in the input JSON.

expected_answer: list[str] | str | None#
expected_answers: list[str] | str | None#
generated_answers: list[dict[str, Any]]#
get_expected_answers()[source]#

Retrieve expected answers as a list, handling aliases and types.

Return type:

list[str]

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str#
query_id: str#
artefactual.calibration.rates_answers.rate_answers(config)[source]#

Rate answers generated by a model using a judge LLM.

Return type:

DataFrame

Args:

config (RatingConfig): Configuration for the rating process.

Returns:

pd.DataFrame: DataFrame containing uncertainty scores and judgments, indexed by query_id.