Skip to content

feat(llmobs): add SyncExperiment.rerun_evaluators() and make task/dataset optional#17292

Open
mehulsonowal wants to merge 3 commits intomehul.sonowal/experiment-to-dataframefrom
mehul.sonowal/re-run-evals
Open

feat(llmobs): add SyncExperiment.rerun_evaluators() and make task/dataset optional#17292
mehulsonowal wants to merge 3 commits intomehul.sonowal/experiment-to-dataframefrom
mehul.sonowal/re-run-evals

Conversation

@mehulsonowal
Copy link
Copy Markdown
Contributor

Summary

  • Add SyncExperiment.rerun_evaluators() to re-score stored task outputs with updated evaluators without re-running the task function
  • Make task and dataset optional in LLMObs.experiment() / Experiment.__init__ / SyncExperiment.__init__ to support the pull() + rerun_evaluators() workflow
  • Guard Experiment.run() to raise ValueError when called without task and dataset
  • Add missing_task_strategy parameter ("raise" / "skip" / "retry") to control error-row handling during re-runs
  • Add tests covering: rerun skips task execution, error strategies, result object replacement, init without task/dataset, run() guard, and rerun without task/dataset

Behavior

# Standard workflow (unchanged)
exp = LLMObs.experiment("my-exp", task, dataset, evaluators=[my_eval], project_name="p")
result = exp.run()

# Re-run evaluators only — no task re-execution
new_result = exp.rerun_evaluators()

# Future: pull prior results from backend then re-evaluate
exp = LLMObs.experiment("my-exp", evaluators=[updated_eval], project_name="p")
exp.pull()  # sets exp.result (pending backend endpoint)
new_result = exp.rerun_evaluators()

Test plan

  • test_experiment_rerun_evaluators_skips_task — verifies _process_record is not called again
  • test_experiment_rerun_raises_on_task_errors — default "raise" strategy
  • test_experiment_rerun_skip_strategy"skip" skips error rows
  • test_experiment_rerun_invalid_strategy — invalid strategy raises ValueError
  • test_experiment_rerun_no_prior_result — guards against calling before run()
  • test_experiment_rerun_preserves_experiment_id — no new experiment created
  • test_experiment_init_without_task_and_dataset — no error at init (passes locally)
  • test_experiment_run_without_task_raisesrun() guard (passes locally)
  • test_experiment_rerun_without_task_succeeds — rerun works with pre-loaded result

🤖 Generated with Claude Code

mehulsonowal and others added 3 commits April 2, 2026 12:12
…n experiments

- Add `SyncExperiment.rerun_evaluators()` to re-score stored task outputs
  without re-executing the task function; reads from `self.result` and
  replaces it with a new `ExperimentResult` containing fresh evaluations
- Add `missing_task_strategy` parameter ('raise'/'skip'/'retry') to control
  behavior when prior rows have errors
- Make `task` and `dataset` optional in `LLMObs.experiment()`,
  `Experiment.__init__`, and `SyncExperiment.__init__` to support the
  pull() + rerun_evaluators() workflow where no task or dataset is needed
- Guard `Experiment.run()` to raise ValueError when called without task/dataset
- Add tests covering: rerun skips task, error strategies, result preservation,
  init without task/dataset, run() guard, and rerun without task/dataset

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@mehulsonowal mehulsonowal requested review from a team as code owners April 2, 2026 20:42
@mehulsonowal mehulsonowal requested review from brettlangdon and rachelyangdog and removed request for a team April 2, 2026 20:42
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

Codeowners resolved as

releasenotes/notes/llmobs-experiment-rerun-evaluators-b3c4d5e6f7a8b9c0.yaml  @DataDog/apm-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant