Testsets
A testset is a versioned bag of testcases: an artifact you commit rows into, where each commit produces an immutable revision. Evaluations pin a specific testset revision, so the same run keeps replaying the same inputs even after you add or change rows.
Testsets are versioned. For commit semantics, include_archived, and how revision IDs stay stable, see Versioning. The rest of this page covers what is specific to testsets.
Body
A testset revision commits an ordered list of testcase IDs plus commit metadata. The testcases themselves are immutable blobs scoped to the testset.
A testcase payload:
{
"id": "e44bf373-44ec-52ce-aa5c-a3bd982e783b",
"testset_id": "019d9ca2-0000-0000-0000-000000000000",
"data": {
"country": "Germany",
"capital": "Berlin",
"testcase_dedup_id": "gr-001"
}
}
The schema of a testcase is implicit in the keys of data. There is no separate column list. Adding a column means committing rows that have the new key.
A testcase can carry a caller-supplied testcase_dedup_id (inside data) that the server uses to identify the same row across re-imports and edits. If you don't provide one, the server generates one from the content hash.
CSV and older JSON uploads also accept a top-level __dedup_id__ field (deprecated). The server normalises it into data.testcase_dedup_id on its way in.
Testcase immutability
A testcase is content-addressed. Its id is a hash of data plus the testset_id (plus an optional salt), so two testcases with the same content in the same testset collapse to the same row.
Because a testcase is a blob, editing a row does not mutate the existing testcase. The commit writes a new blob with a new ID, then creates a new revision whose testcase list points at the new ID. The old blob is still reachable by its original ID. That is why revision history is reproducible: a revision is just a list of pointers to blobs that never change.
Import and export
Testcases move in and out of a testset as CSV or JSON arrays.
| Endpoint | Action |
|---|---|
POST /testsets/revisions/{id}/upload | Commit a new revision from a CSV or JSON file. |
POST /testsets/revisions/{id}/download | Download the revision's testcases as CSV or JSON. |
POST /simple/testsets/upload | Create a new testset and seed its first revision in one call. |
POST /simple/testsets/{id}/upload | Commit a new revision on an existing simple testset. |
Each row may include __id__ (preserve an existing testcase ID), __flags__, __tags__, and __meta__. Empty metadata columns are dropped from exports.
Retrieve semantics
POST /testsets/revisions/retrieve resolves a testset, variant, or revision reference to a single revision. Two options control what comes back:
include_testcasesdefaults totrue. Full testcase objects are returned unless you opt out.include_testcase_idsdefaults totrue. The ordered ID list is returned unless you opt out.
For instance, to retrieve a revision with only the ordered ID list and no testcase bodies:
curl -X POST "$AGENTA_HOST/api/testsets/revisions/retrieve" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset_ref": {"id": "019d9ca1-0000-0000-0000-000000000000"},
"include_testcases": false
}'
This is the opposite default of POST /queries/revisions/retrieve (see Query Pattern), where include_traces defaults to false because trace materialisation is expensive. Keep the asymmetry in mind when you build clients that consume both.
Simple endpoints
The /simple/testsets/ surface collapses the artifact, variant, and latest revision into one flat record with its testcases merged in:
curl "$AGENTA_HOST/api/simple/testsets/019d9ca1-0000-0000-0000-000000000000" \
-H "Authorization: ApiKey $AGENTA_API_KEY"
Use it when you want the "current testset" without tracking lineage. For commits, forks, or specific-revision retrieval, use /testsets/, /testsets/variants/, and /testsets/revisions/. See Simple Endpoints for the general pattern.
Relationship to evaluations
Evaluations are run against a testset. Each evaluation pins a specific testset_revision_id when the run is configured. Committing a new revision on the testset does not retroactively change a pinned run.
Example
Create a testset with two rows, add a row, and retrieve the latest revision.
# 1. Create a simple testset with seed rows.
curl -X POST "$AGENTA_HOST/api/simple/testsets/" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset": {
"slug": "country-capitals",
"name": "country-capitals",
"data": {
"testcases": [
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Japan", "capital": "Tokyo"}}
]
}
}
}'
The response includes id, revision_id, and the testcase IDs that were created.
# 2. Add a row as a new revision using the structured commit endpoint.
curl -X POST "$AGENTA_HOST/api/testsets/revisions/commit" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset_revision_commit": {
"testset_id": "019d9ca1-0000-0000-0000-000000000000",
"testset_revision_id": "019d9ca1-0000-0000-0000-000000000001",
"message": "Add Brazil",
"delta": {
"rows": {
"add": [{"data": {"country": "Brazil", "capital": "Brasilia"}}]
}
}
}
}'
# 3. Retrieve the latest revision of the testset.
curl -X POST "$AGENTA_HOST/api/testsets/revisions/retrieve" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{"testset_ref": {"id": "019d9ca1-0000-0000-0000-000000000000"}}'
An evaluation that pinned revision 019d9ca1-0000-0000-0000-000000000001 keeps replaying the original two rows even though the testset now has three.
Lifecycle
Testsets, variants, and revisions are soft-deleted. Use POST /testsets/{id}/archive and POST /testsets/{id}/unarchive to flip deleted_at. The same pattern applies at the variant and revision level. Archiving a testset hides it from /query responses unless the request sets include_archived: true. See Versioning.