Skip to content

Revamp Data Quality Monitoring #5919

@franciscojavierarceo

Description

@franciscojavierarceo

Is your feature request related to a problem? Please describe.

Feast defines and serves features via Feature Views, but it doesn’t provide a first-class way to monitor feature data quality and feature drift over time — especially for features produced via On Demand Transformations (ODT).

Today, users often need to compare the “training distribution” (from a one-off static training dataset) with the distribution of feature values observed in production over time. This typically requires:

  • custom online feature logging pipelines,
  • bespoke Spark/SQL jobs to compute metrics and drift,
  • and external dashboards/alerting systems.

This becomes even more complex when ODT is involved, since transformed feature values are not easily reproducible unless transformation provenance and feature versions are tracked.

Describe the solution you'd like

Introduce a new first-class concept in Feast: DQMJob (Data Quality Monitoring Job), which computes feature quality and drift metrics for one or more Feature Views over time and reports results in the Feast UI.

Proposed API:

DQMJob(
    feature_view_names: List[str],
    data_source=None,  # should be the sink of the logged feature view values+metadata
    features_to_exclude: Optional[List[str]] = None,
    features_to_include: Optional[List[str]] = None,  # one or the other
    time_interval: Literal["day", "week", "month"] = "day",
    feature_drift_metrics_config=None,  # schema-based defaults + per-feature overrides
    baseline_config=None,  # static training dataset distribution reference
    logging_config=None,   # simple: 100% or sampled logging
)

Key components:

  1. Simple feature logging to an offline sink
  • During online feature retrieval, Feast can optionally emit feature logs to a configured offline sink (e.g., S3/GCS/ADLS or a warehouse table).

  • Logging supports either:

    • 100% of requests, or
    • sampled (e.g., p=0.01).
  • The sink is append-only and partitioned by time (e.g., dt=YYYY-MM-DD).

  • This offline dataset represents the “live” production feature values used by DQMJob.

  1. ODT reproducibility via registry versioning
  • For On Demand Transformations, feature logs include:

    • input feature references + versions,
    • output feature references + versions,
    • transformation identifier (content hash),
    • registry revision (or equivalent version id).
  • Transformation inputs/outputs and transformation code/artifact references are versioned in the Feast registry.

  • A content-addressed hash defines transformation identity and derived feature versions, enabling deterministic reproduction.

  1. Baseline comparison against static training dataset
  • Baseline is the distribution from a one-off training dataset run (static).
  • Each time bucket (e.g., daily) is compared against this fixed baseline using drift metrics appropriate to feature type.
  • Baseline is referenced deterministically (e.g., baseline_dataset_uri or a registry-managed baseline_snapshot_id).
  1. Schema-aware metrics
  • Quality metrics (examples):

    • missing rate
    • min/max/range
    • mean/median
    • unique count / cardinality
  • Drift metrics selected by schema:

    • numeric: PSI, KS-statistic, etc.
    • categorical: PSI, JS-divergence, entropy shift, etc.
  • Defaults inferred from feature schema with per-feature overrides.

  1. Execution model
  • DQMJob compiles into OfflineStore-specific execution plans (Spark DataFrame operations, warehouse SQL, etc.).
  • Jobs are intended to be run as batch jobs via orchestrators (Airflow, KFP, CLI, scripts).
  • OfflineStore implementations expose extensions for computing metrics efficiently.
  1. Registry-managed metrics storage + Feast UI
  • Computed metrics are written to a registry-managed metrics table (or registry-owned storage location).

  • Each metric record includes:

    • feature_view_name
    • feature_name
    • feature_version / registry_revision
    • metric_Name
    • metric_value
    • time_bucket
    • baseline_reference
  • Feast UI queries this table to display:

    • metric time series
    • drift summaries
    • per-feature quality views

Describe alternatives you've considered

  • Integrating Feast with third-party observability tools (Evidently, WhyLabs, Arize, etc.) using custom logging pipelines.
  • Building custom Spark/SQL jobs on offline training data and online logs.
  • Monitoring only offline data (which misses what is actually served online, especially with ODT).

While workable, these approaches require significant glue code and lack consistent offline/online symmetry within Feast.

Additional context

  • Logging should be configurable for privacy and cost (sampling vs 100%).
  • Feature logs should be stored in batch-friendly formats (e.g., Parquet/Delta) and partitioned by time.
  • The design should preserve offline/online symmetry and leverage Feast registry versioning for reproducibility.
  • The system should remain backend-agnostic while using OfflineStore extensions for computation.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions