Revamp Data Quality Monitoring

**Is your feature request related to a problem? Please describe.**

Feast defines and serves features via Feature Views, but it doesn’t provide a first-class way to monitor feature data quality and feature drift over time — especially for features produced via On Demand Transformations (ODT). 

Today, users often need to compare the “training distribution” (from a one-off static training dataset) with the distribution of feature values observed in production over time. This typically requires:

- custom online feature logging pipelines,
- bespoke Spark/SQL jobs to compute metrics and drift,
- and external dashboards/alerting systems.

This becomes even more complex when ODT is involved, since transformed feature values are not easily reproducible unless transformation provenance and feature versions are tracked.

**Describe the solution you'd like**

Introduce a new first-class concept in Feast: `DQMJob` (Data Quality Monitoring Job), which computes feature quality and drift metrics for one or more Feature Views over time and reports results in the Feast UI.

Proposed API:

```python
DQMJob(
    feature_view_names: List[str],
    data_source=None,  # should be the sink of the logged feature view values+metadata
    features_to_exclude: Optional[List[str]] = None,
    features_to_include: Optional[List[str]] = None,  # one or the other
    time_interval: Literal["day", "week", "month"] = "day",
    feature_drift_metrics_config=None,  # schema-based defaults + per-feature overrides
    baseline_config=None,  # static training dataset distribution reference
    logging_config=None,   # simple: 100% or sampled logging
)
````

Key components:

1. **Simple feature logging to an offline sink**

* During online feature retrieval, Feast can optionally emit feature logs to a configured offline sink (e.g., S3/GCS/ADLS or a warehouse table).
* Logging supports either:

  * 100% of requests, or
  * sampled (e.g., p=0.01).
* The sink is append-only and partitioned by time (e.g., dt=YYYY-MM-DD).
* This offline dataset represents the “live” production feature values used by DQMJob.

2. **ODT reproducibility via registry versioning**

* For On Demand Transformations, feature logs include:

  * input feature references + versions,
  * output feature references + versions,
  * transformation identifier (content hash),
  * registry revision (or equivalent version id).
* Transformation inputs/outputs and transformation code/artifact references are versioned in the Feast registry.
* A content-addressed hash defines transformation identity and derived feature versions, enabling deterministic reproduction.

3. **Baseline comparison against static training dataset**

* Baseline is the distribution from a one-off training dataset run (static).
* Each time bucket (e.g., daily) is compared against this fixed baseline using drift metrics appropriate to feature type.
* Baseline is referenced deterministically (e.g., `baseline_dataset_uri` or a registry-managed `baseline_snapshot_id`).

4. **Schema-aware metrics**

* Quality metrics (examples):

  * missing rate
  * min/max/range
  * mean/median
  * unique count / cardinality
* Drift metrics selected by schema:

  * numeric: PSI, KS-statistic, etc.
  * categorical: PSI, JS-divergence, entropy shift, etc.
* Defaults inferred from feature schema with per-feature overrides.

5. **Execution model**

* `DQMJob` compiles into OfflineStore-specific execution plans (Spark DataFrame operations, warehouse SQL, etc.).
* Jobs are intended to be run as batch jobs via orchestrators (Airflow, KFP, CLI, scripts).
* OfflineStore implementations expose extensions for computing metrics efficiently.

6. **Registry-managed metrics storage + Feast UI**

* Computed metrics are written to a registry-managed metrics table (or registry-owned storage location).
* Each metric record includes:

  * feature_view_name
  * feature_name
  * feature_version / registry_revision
  * metric_Name
  * metric_value
  * time_bucket
  * baseline_reference
* Feast UI queries this table to display:
  * metric time series
  * drift summaries
  * per-feature quality views

**Describe alternatives you've considered**

* Integrating Feast with third-party observability tools (Evidently, WhyLabs, Arize, etc.) using custom logging pipelines.
* Building custom Spark/SQL jobs on offline training data and online logs.
* Monitoring only offline data (which misses what is actually served online, especially with ODT).

While workable, these approaches require significant glue code and lack consistent offline/online symmetry within Feast.

**Additional context**

* Logging should be configurable for privacy and cost (sampling vs 100%).
* Feature logs should be stored in batch-friendly formats (e.g., Parquet/Delta) and partitioned by time.
* The design should preserve offline/online symmetry and leverage Feast registry versioning for reproducibility.
* The system should remain backend-agnostic while using OfflineStore extensions for computation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp Data Quality Monitoring #5919

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Revamp Data Quality Monitoring #5919

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions