v1/evalLaunch an eval
Launch an evaluation. This is the API-equivalent of the Eval function that is built into the Braintrust SDK. In the Eval API, you provide pointers to a dataset, task function, and scoring functions. The API will then run the evaluation, create an experiment, and return the results along with a link to the experiment. To learn more about evals, see the Evals guide.
Authorization
AuthorizationRequiredBearer <token>
Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.
In: header
Request Body
application/jsonRequiredEval launch parameters
project_idRequiredstring
Unique identifier for the project to run the eval in
dataRequiredAny properties in dataset_id, project_dataset_name, dataset_rows
The dataset to use
taskRequiredAny properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt
The function to evaluate
scoresRequiredarray<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>
The functions to score the eval on
experiment_namestring
An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.
metadataobject
Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.
parentAny properties in span_parent_struct, string
Options for tracing the evaluation
streamboolean
Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.
trial_countnumber | null
The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
is_publicboolean | null
Whether the experiment should be public. Defaults to false.
timeoutnumber | null
The maximum duration, in milliseconds, to run the evaluation. Defaults to undefined, in which case there is no timeout.
max_concurrencynumber | null
The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.
base_experiment_namestring | null
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
base_experiment_idstring | null
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
git_metadata_settingsobject | null
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
repo_infoobject | null & unknown
Eval launch response