Evals - Docs - Braintrust

POST

/v1/eval

Launch an eval

Launch an evaluation. This is the API-equivalent of the Eval function that is built into the Braintrust SDK. In the Eval API, you provide pointers to a dataset, task function, and scoring functions. The API will then run the evaluation, create an experiment, and return the results along with a link to the experiment. To learn more about evals, see the Evals guide.

Authorization

`Authorization`
Required
Bearer <token>

Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.

In: header

Request Body

application/jsonRequired

Eval launch parameters

`project_id`
Required
string

Unique identifier for the project to run the eval in

`data`
Required
Any properties in dataset_id, project_dataset_name, dataset_rows

The dataset to use

`task`
Required
Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt

The function to evaluate

`scores`
Required
array<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>

The functions to score the eval on

`experiment_name`string

An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.

`metadata`object

Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.

`parent`Any properties in span_parent_struct, string

Options for tracing the evaluation

`stream`boolean

Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.

`trial_count`number | null

The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.

`is_public`boolean | null

Whether the experiment should be public. Defaults to false.

`timeout`number | null

The maximum duration, in milliseconds, to run the evaluation. Defaults to undefined, in which case there is no timeout.

`max_concurrency`number | null

The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.

`base_experiment_name`string | null

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

`base_experiment_id`string | null

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

`git_metadata_settings`object | null

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

`repo_info`object | null & unknown

curl -X POST "https://api.braintrust.dev/v1/eval" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "string",
    "data": {
      "dataset_id": "string"
    },
    "task": {
      "function_id": "string",
      "version": "string"
    },
    "scores": [
      {
        "function_id": "string",
        "version": "string"
      }
    ],
    "experiment_name": "string",
    "metadata": {
      "property1": null,
      "property2": null
    },
    "parent": {
      "object_type": "project_logs",
      "object_id": "string",
      "row_ids": {
        "id": "string",
        "span_id": "string",
        "root_span_id": "string"
      },
      "propagated_event": {
        "property1": null,
        "property2": null
      }
    },
    "stream": true,
    "trial_count": 0,
    "is_public": true,
    "timeout": 0,
    "max_concurrency": 0,
    "base_experiment_name": "string",
    "base_experiment_id": "string",
    "git_metadata_settings": {
      "collect": "all",
      "fields": [
        "commit"
      ]
    },
    "repo_info": {
      "commit": "string",
      "branch": "string",
      "tag": "string",
      "dirty": true,
      "author_name": "string",
      "author_email": "string",
      "commit_message": "string",
      "commit_time": "string",
      "git_diff": "string"
    }
  }'

Eval launch response

{
  "project_name": "string",
  "experiment_name": "string",
  "project_url": "http://example.com",
  "experiment_url": "http://example.com",
  "comparison_experiment_name": "string",
  "scores": {
    "property1": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    }
  },
  "metrics": {
    "property1": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    }
  }
}

Launch an eval

Body

Authorization

AuthorizationRequiredBearer <token>

Request Body

project_idRequiredstring

dataRequiredAny properties in dataset_id, project_dataset_name, dataset_rows

dataset_id

project_dataset_name

dataset_rows

taskRequiredAny properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt

function_id

project_slug

global_function

prompt_session_id

inline_code

inline_prompt

scoresRequiredarray<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>

Object 1

experiment_namestring

metadataobject

Attributes

parentAny properties in span_parent_struct, string

span_parent_struct

streamboolean

trial_countnumber | null

is_publicboolean | null

timeoutnumber | null

max_concurrencynumber | null

base_experiment_namestring | null

base_experiment_idstring | null

git_metadata_settingsobject | null

Attributes

repo_infoobject | null & unknown

Attributes

Response

TypeScript

On this page

`Authorization`
Required
Bearer <token>

`project_id`
Required
string

`data`
Required
Any properties in dataset_id, project_dataset_name, dataset_rows

`task`
Required
Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt

`scores`
Required
array<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>

`experiment_name`string

`metadata`object

`parent`Any properties in span_parent_struct, string

`stream`boolean

`trial_count`number | null

`is_public`boolean | null

`timeout`number | null

`max_concurrency`number | null

`base_experiment_name`string | null

`base_experiment_id`string | null

`git_metadata_settings`object | null

`repo_info`object | null & unknown