If different sensitive attribute values use different thresholds, the equalized odds intervention won't be sync across the values.
Therefore, an updated version of roc_curve from sklearn should be used, that takes the global thresholds and generate (fpr,tpr) for each sensitive attribute value:
https://github.com/scikit-learn/scikit-learn/blob/7b136e92acf49d46251479b75c88cba632de1937/sklearn/metrics/ranking.py#L535