feat(llmobs): add SyncExperiment.rerun_evaluators() and make task/dataset optional by mehulsonowal · Pull Request #17292 · DataDog/dd-trace-py

mehulsonowal · 2026-04-02T20:42:17Z

Summary

Add SyncExperiment.rerun_evaluators() to re-score stored task outputs with updated evaluators without re-running the task function
Make task and dataset optional in LLMObs.experiment() / Experiment.__init__ / SyncExperiment.__init__ to support the pull() + rerun_evaluators() workflow
Guard Experiment.run() to raise ValueError when called without task and dataset
Add missing_task_strategy parameter ("raise" / "skip" / "retry") to control error-row handling during re-runs
Add tests covering: rerun skips task execution, error strategies, result object replacement, init without task/dataset, run() guard, and rerun without task/dataset

Behavior

# Standard workflow (unchanged)
exp = LLMObs.experiment("my-exp", task, dataset, evaluators=[my_eval], project_name="p")
result = exp.run()

# Re-run evaluators only — no task re-execution
new_result = exp.rerun_evaluators()

# Future: pull prior results from backend then re-evaluate
exp = LLMObs.experiment("my-exp", evaluators=[updated_eval], project_name="p")
exp.pull()  # sets exp.result (pending backend endpoint)
new_result = exp.rerun_evaluators()

Test plan

test_experiment_rerun_evaluators_skips_task — verifies _process_record is not called again
test_experiment_rerun_raises_on_task_errors — default "raise" strategy
test_experiment_rerun_skip_strategy — "skip" skips error rows
test_experiment_rerun_invalid_strategy — invalid strategy raises ValueError
test_experiment_rerun_no_prior_result — guards against calling before run()
test_experiment_rerun_preserves_experiment_id — no new experiment created
test_experiment_init_without_task_and_dataset — no error at init (passes locally)
test_experiment_run_without_task_raises — run() guard (passes locally)
test_experiment_rerun_without_task_succeeds — rerun works with pre-loaded result

🤖 Generated with Claude Code

…n experiments - Add `SyncExperiment.rerun_evaluators()` to re-score stored task outputs without re-executing the task function; reads from `self.result` and replaces it with a new `ExperimentResult` containing fresh evaluations - Add `missing_task_strategy` parameter ('raise'/'skip'/'retry') to control behavior when prior rows have errors - Make `task` and `dataset` optional in `LLMObs.experiment()`, `Experiment.__init__`, and `SyncExperiment.__init__` to support the pull() + rerun_evaluators() workflow where no task or dataset is needed - Guard `Experiment.run()` to raise ValueError when called without task/dataset - Add tests covering: rerun skips task, error strategies, result preservation, init without task/dataset, run() guard, and rerun without task/dataset Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…k/dataset Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

cit-pr-commenter-54b7da · 2026-04-02T20:43:19Z

Codeowners resolved as

releasenotes/notes/llmobs-experiment-rerun-evaluators-b3c4d5e6f7a8b9c0.yaml  @DataDog/apm-python

mehulsonowal and others added 3 commits April 2, 2026 12:12

add rerun_evaulators with decoupled evalmetrics

d1df9cb

chore(llmobs): add release note for rerun_evaluators and optional tas…

5d5e1af

…k/dataset Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

mehulsonowal requested review from a team as code owners April 2, 2026 20:42

mehulsonowal requested review from brettlangdon and rachelyangdog and removed request for a team April 2, 2026 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): add SyncExperiment.rerun_evaluators() and make task/dataset optional#17292

feat(llmobs): add SyncExperiment.rerun_evaluators() and make task/dataset optional#17292
mehulsonowal wants to merge 3 commits intomehul.sonowal/experiment-to-dataframefrom
mehul.sonowal/re-run-evals

mehulsonowal commented Apr 2, 2026

Uh oh!

cit-pr-commenter-54b7da bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mehulsonowal commented Apr 2, 2026

Summary

Behavior

Test plan

Uh oh!

cit-pr-commenter-54b7da bot commented Apr 2, 2026

Codeowners resolved as

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant