Since the evaluation-of-OPE requires knowledge of the on-policy policy values, is OPS only relevant for synthetic data where the underlying behavior policy value is known? Or is it possible to estimate the on-policy policy value from real-world data, as well?
When I run this code block from the basic_synthetic_continious_advanced.ipynb notebook on my real-world dataset
ops = OffPolicySelection(
ope=ope,
cumulative_distribution_ope=cd_ope,
)
ops.obtain_true_selection_result(
input_dict=input_dict,
return_variance=True,
return_lower_quartile=True,
return_conditional_value_at_risk=True,
return_by_dataframe=True,
)
I get the following error: ValueError: one of the candidate policies, cql, does not contain on-policy policy value in input_dict.
Edit: After posting this issue, it occured to me that "to estimate the on-policy policy value from real-world data" would just be equivalent to doing OPE, so evaluation-of-OPE would not be possible in that case. Please, correct me if I am misunderstanding anything.
Since the evaluation-of-OPE requires knowledge of the on-policy policy values, is OPS only relevant for synthetic data where the underlying behavior policy value is known? Or is it possible to estimate the on-policy policy value from real-world data, as well?
When I run this code block from the
basic_synthetic_continious_advanced.ipynbnotebook on my real-world datasetI get the following error:
ValueError: one of the candidate policies, cql, does not contain on-policy policy value in input_dict.Edit: After posting this issue, it occured to me that "to estimate the on-policy policy value from real-world data" would just be equivalent to doing OPE, so evaluation-of-OPE would not be possible in that case. Please, correct me if I am misunderstanding anything.