feast-dev
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/SUMMARY.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/SUMMARY.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/getting-started/concepts/feature-retrieval.md‎
Lines changed: 53 additions & 0 deletions b/‎docs/getting-started/concepts/feature-retrieval.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎docs/reference/data-sources/README.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/reference/data-sources/README.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/reference/data-sources/ray.md‎
Lines changed: 233 additions & 0 deletions b/‎docs/reference/data-sources/ray.md‎
Lines changed: 233 additions & 0 deletions
diff --git a/‎docs/reference/offline-stores/ray.md‎
Lines changed: 36 additions & 5 deletions b/‎docs/reference/offline-stores/ray.md‎
Lines changed: 36 additions & 5 deletions
diff --git a/‎docs/roadmap.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/roadmap.md‎
Lines changed: 1 addition & 0 deletions
@@ -185,6 +185,7 @@ The list below contains the functionality that contributors are planning to deve
   * [x] [Athena (contrib plugin)](https://docs.feast.dev/reference/data-sources/athena)
   * [x] [Clickhouse (contrib plugin)](https://docs.feast.dev/reference/data-sources/clickhouse)
   * [x] [Oracle (contrib plugin)](https://docs.feast.dev/reference/data-sources/oracle)
+  * [x] [Ray source (contrib plugin)](https://docs.feast.dev/reference/data-sources/ray)
   * [x] Kafka / Kinesis sources (via [push support into the online store](https://docs.feast.dev/reference/data-sources/push))
 * **Offline Stores**
   * [x] [Snowflake](https://docs.feast.dev/reference/offline-stores/snowflake)
 
@@ -104,6 +104,7 @@
   * [Oracle (contrib)](reference/data-sources/oracle.md)
   * [Athena (contrib)](reference/data-sources/athena.md)
   * [Clickhouse (contrib)](reference/data-sources/clickhouse.md)
+  * [Ray (contrib)](reference/data-sources/ray.md)
 * [Offline stores](reference/offline-stores/README.md)
   * [Overview](reference/offline-stores/overview.md)
   * [Dask](reference/offline-stores/dask.md)
 
@@ -297,6 +297,59 @@ training_df = store.get_historical_features(
 ).to_df()
 ```
 
+### Step 3: Choosing an output format
+
+`get_historical_features()` returns a `RetrievalJob` object. You can convert it
+to the format that suits your downstream pipeline:
+
+**Data conversion methods**
+
+| Method | Returns | When to use |
+|---|---|---|
+| `.to_df()` | `pandas.DataFrame` | General-purpose; scikit-learn, XGBoost, statsmodels |
+| `.to_feast_df()` | `FeastDataFrame` | Feast-native wrapper with engine metadata; preferred for Feast-internal tooling |
+| `.to_arrow()` | `pyarrow.Table` | Arrow-native pipelines, Polars, DuckDB, zero-copy interchange |
+| `.to_tensor(kind="torch")` | `Dict[str, torch.Tensor]` | Direct PyTorch training loops; numeric columns become tensors |
+| `.to_ray_dataset()` | `ray.data.Dataset` | Ray Train, Ray Serve, distributed ML workloads |
+
+**Persistence methods**
+
+| Method | Effect | When to use |
+|---|---|---|
+| `.persist(storage)` | Writes result to offline storage | Save a training dataset for later reuse or auditing |
+| `.to_remote_storage()` | Exports result to S3/GCS as Parquet files | Hand off to external systems or data pipelines |
+
+#### Retrieving as a Ray Dataset
+
+`to_ray_dataset()` is a **first-class method** on every `RetrievalJob`. When
+the underlying offline store is a `RayOfflineStore`, the dataset is returned
+directly without a copy through Arrow. For all other offline stores, a
+zero-copy Arrow → Ray Dataset conversion is used as a fallback.
+
+```python
+from feast import FeatureStore
+
+store = FeatureStore(".")
+
+# to_ray_dataset() is a first-class method on the RetrievalJob — chain it
+# directly after get_historical_features().
+ray_ds = store.get_historical_features(
+    entity_df=entity_df,
+    features=["driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate"],
+).to_ray_dataset()
+
+# Use with Ray Train
+import ray.train
+trainer = ray.train.torch.TorchTrainer(
+    train_loop_per_worker=...,
+    datasets={"train": ray_ds},
+)
+```
+
+> **Note:** `to_ray_dataset()` requires `feast[ray]` to be installed.
+
+---
+
 ## Retrieving online features (for model inference)
 Feast will ensure the latest feature values for registered features are available. At retrieval time, you need to supply a list of **entities** and the corresponding **features** to be retrieved. Similar to `get_historical_features`, we recommend using feature services as a mechanism for grouping features in a model version.
 
 
@@ -65,3 +65,7 @@ Please see [Data Source](../../getting-started/concepts/data-ingestion.md) for a
 {% content-ref url="oracle.md" %}
 [oracle.md](oracle.md)
 {% endcontent-ref %}
+
+{% content-ref url="ray.md" %}
+[ray.md](ray.md)
+{% endcontent-ref %}
@@ -0,0 +1,233 @@
+# Ray Data Source (contrib)
+
+> **⚠️ Contrib Plugin:**
+> `RaySource` is a contributed plugin shipped alongside the [Ray offline store](../offline-stores/ray.md). It may not be as stable or fully supported as core data sources.
+
+`RaySource` is a pure-metadata descriptor that tells Feast **how** to load a
+[Ray Dataset](https://docs.ray.io/en/latest/data/api/dataset.html) from any
+source that Ray Data supports natively — Parquet, CSV, JSON, HuggingFace
+Datasets, MongoDB, binary files, images, TFRecords, and more.
+
+It is the recommended data source when using the
+[Ray offline store](../offline-stores/ray.md) and replaces the need for
+`FileSource` for all non-Parquet and non-file-based data.
+
+---
+
+## When to use RaySource vs FileSource
+
+| Scenario | Recommended source |
+|---|---|
+| Parquet files on disk / S3 / GCS (existing setup) | `FileSource` (backward compatible) |
+| Parquet via Ray reader (pipelines, remote auth) | `RaySource(reader_type="parquet")` |
+| CSV, JSON, text, images via Ray | `RaySource` |
+| HuggingFace `datasets` library | `RaySource(reader_type="huggingface")` |
+| MongoDB, SQL, TFRecords, WebDataset | `RaySource` |
+
+---
+
+## Installation
+
+`RaySource` is bundled with the Ray offline store contrib package:
+
+```bash
+pip install 'feast[ray]'
+```
+
+---
+
+## Supported `reader_type` values
+
+| `reader_type` | Underlying Ray API | Notes |
+|---|---|---|
+| `parquet` | `ray.data.read_parquet` | S3, GCS, HDFS, local |
+| `csv` | `ray.data.read_csv` | |
+| `json` | `ray.data.read_json` | |
+| `text` | `ray.data.read_text` | |
+| `images` | `ray.data.read_images` | |
+| `binary_files` | `ray.data.read_binary_files` | |
+| `tfrecords` | `ray.data.read_tfrecords` | |
+| `webdataset` | `ray.data.read_webdataset` | |
+| `huggingface` | `ray.data.from_huggingface` | Wraps `datasets.load_dataset` |
+| `mongo` | `ray.data.read_mongo` | |
+| `sql` | `ray.data.read_sql` | Pass `connection_url` in `reader_options` |
+
+---
+
+## Configuration
+
+### Parameters
+
+| Parameter | Type | Required | Description |
+|---|---|---|---|
+| `name` | `str` | Yes | Unique name for this data source |
+| `reader_type` | `str` | Yes | One of the supported reader types above |
+| `path` | `str` | No | File or directory path (required for file-based readers) |
+| `reader_options` | `dict` | No | Extra keyword arguments forwarded to the Ray reader |
+| `timestamp_field` | `str` | No | Column containing event timestamps |
+| `created_timestamp_column` | `str` | No | Column containing row creation timestamps |
+| `tags` | `dict` | No | Arbitrary key-value metadata |
+| `description` | `str` | No | Human-readable description |
+| `owner` | `str` | No | Owning team or contact |
+
+---
+
+## Usage examples
+
+### Parquet on S3
+
+```python
+from feast.infra.offline_stores.contrib.ray_offline_store.ray_source import RaySource
+
+driver_stats = RaySource(
+    name="driver_stats_parquet",
+    reader_type="parquet",
+    path="s3://my-bucket/driver_stats/",
+    timestamp_field="event_timestamp",
+)
+```
+
+### CSV
+
+```python
+sensor_readings = RaySource(
+    name="sensor_readings_csv",
+    reader_type="csv",
+    path="/data/sensors/",
+    timestamp_field="ts",
+)
+```
+
+### HuggingFace dataset
+
+Load a dataset from the [HuggingFace Hub](https://huggingface.co/datasets)
+directly into Feast.
+
+```python
+from feast.infra.offline_stores.contrib.ray_offline_store.ray_source import RaySource
+
+cheque_images = RaySource(
+    name="cheque_images_hf",
+    reader_type="huggingface",
+    reader_options={
+        "dataset_name": "cheques_sample_data",
+        "split": "train",
+    },
+    timestamp_field="event_timestamp",
+)
+```
+
+### MongoDB
+
+```python
+transaction_log = RaySource(
+    name="transactions_mongo",
+    reader_type="mongo",
+    reader_options={
+        "uri": "mongodb://localhost:27017",
+        "database": "featuredb",
+        "collection": "transactions",
+    },
+    timestamp_field="created_at",
+)
+```
+
+### SQL (via connection URL)
+
+```python
+user_features = RaySource(
+    name="user_features_sql",
+    reader_type="sql",
+    reader_options={
+        "connection_url": "postgresql+psycopg2://user:password@host:5432/db",  # pragma: allowlist secret
+        "query": "SELECT * FROM user_features",
+    },
+    timestamp_field="event_timestamp",
+)
+```
+
+---
+
+## Using RaySource in a BatchFeatureView
+
+```python
+from datetime import timedelta
+from feast import BatchFeatureView, Entity, Field
+from feast.types import Float32, Int64, String
+from feast.infra.offline_stores.contrib.ray_offline_store.ray_source import RaySource
+
+cheque = Entity(name="cheque_id", description="Unique cheque identifier")
+
+cheque_source = RaySource(
+    name="cheque_images_hf",
+    reader_type="huggingface",
+    reader_options={
+        "dataset_name": "cheques_sample_data",
+        "split": "train",
+    },
+    timestamp_field="event_timestamp",
+)
+
+cheque_ocr_fv = BatchFeatureView(
+    name="cheque_ocr_features",
+    entities=[cheque],
+    ttl=timedelta(days=365),
+    schema=[
+        Field(name="cheque_id", dtype=Int64),
+        Field(name="payee_name", dtype=String),
+        Field(name="amount", dtype=String),
+        Field(name="bank_name", dtype=String),
+        Field(name="raw_text", dtype=String),
+    ],
+    source=cheque_source,
+)
+```
+
+---
+
+## Retrieving data as a Ray Dataset
+
+Once the feature view is materialised you can retrieve the offline features
+directly as a Ray Dataset using the first-class `to_ray_dataset()` method:
+
+```python
+from feast import FeatureStore
+
+store = FeatureStore(".")
+
+# Chain directly on the retrieval job — to_ray_dataset() is a first-class
+# method on every RetrievalJobs.
+ds = store.get_historical_features(
+    features=["cheque_ocr_features:payee_name", "cheque_ocr_features:amount"],
+    entity_df=entity_df,
+).to_ray_dataset()
+
+# Use the dataset downstream in Ray or ML pipelines
+ds.show(3)
+```
+
+---
+
+## Proto serialisation
+
+`RaySource` is fully serialisable to Feast's protobuf registry format. The
+`reader_type`, `path`, and `reader_options` dict are all persisted and can be
+round-tripped via `to_proto()` / `from_proto()`.
+
+---
+
+## Limitations
+
+* The Ray offline store (and therefore `RaySource`) requires `feast[ray]`.
+* `reader_type="sql"` requires a serialisable `connection_url`; raw
+  `sqlalchemy.engine.Engine` objects cannot be pickled across Ray workers.
+* Streaming sources (Kafka, Kinesis) are not supported via `RaySource`; use
+  the dedicated [Kafka](kafka.md) or [Kinesis](kinesis.md) data sources.
+
+---
+
+## Related pages
+
+* [Ray Offline Store](../offline-stores/ray.md)
+* [Ray Compute Engine](../compute-engine/ray.md)
+* [Feature Retrieval](../../getting-started/concepts/feature-retrieval.md)
@@ -557,14 +557,45 @@ except Exception as e:
     print(f"Data source validation failed: {e}")
 ```
 
+## Data Sources
+
+[`RaySource`](../data-sources/ray.md) is the recommended data source for the
+Ray offline store. It is a pure-metadata descriptor that tells Feast how to
+load a Ray Dataset from any source Ray Data supports — Parquet, CSV, JSON,
+HuggingFace datasets, MongoDB, binary files, images, TFRecords, WebDataset,
+SQL, and more.
+
+```python
+from feast.infra.offline_stores.contrib.ray_offline_store.ray_source import RaySource
+
+# Load directly from the HuggingFace Hub 
+cheque_source = RaySource(
+    name="cheque_images_hf",
+    reader_type="huggingface",
+    reader_options={
+        "dataset_name": "cheques_sample_data",
+        "split": "train",
+    },
+    timestamp_field="event_timestamp",
+)
+```
+
+See the [RaySource reference](../data-sources/ray.md) for a full list of
+`reader_type` values and configuration options.
+
+> **Note:** `FileSource` (Parquet) remains supported for backward compatibility
+> but `RaySource(reader_type="parquet")` is preferred for new projects.
+
 ## Limitations
 
-The Ray offline store has the following limitations:
+The Ray offline store has one known limitation:
 
-1. **File Sources Only**: Currently supports only `FileSource` data sources
-2. **No Direct SQL**: Does not support SQL query interfaces
-3. **No Online Writes**: Cannot write directly to online stores
-4. **No Complex Transformations**: The Ray offline store focuses on data I/O operations. For complex feature transformations (aggregations, joins, custom UDFs), use the [Ray Compute Engine](../compute-engine/ray.md) instead
+* **`online_write_batch` not implemented**: The `OfflineStore.online_write_batch()` interface
+  is not supported by the Ray offline store. This does **not** affect materialization —
+  `feast materialize` writes to the online store correctly via the
+  [Ray Compute Engine](../compute-engine/ray.md). The restriction only applies to callers
+  that invoke `online_write_batch` on the offline store object directly, which is an
+  uncommon pattern outside of custom tooling.
 
 ## Integration with Ray Compute Engine
 
 
@@ -20,6 +20,7 @@ The list below contains the functionality that contributors are planning to deve
   * [x] [Athena (contrib plugin)](https://docs.feast.dev/reference/data-sources/athena)
   * [x] [Clickhouse (contrib plugin)](https://docs.feast.dev/reference/data-sources/clickhouse)
   * [x] [Oracle (contrib plugin)](https://docs.feast.dev/reference/data-sources/oracle)
+  * [x] [Ray source (contrib plugin)](https://docs.feast.dev/reference/data-sources/ray)
   * [x] Kafka / Kinesis sources (via [push support into the online store](https://docs.feast.dev/reference/data-sources/push))
 * **Offline Stores**
   * [x] [Snowflake](https://docs.feast.dev/reference/offline-stores/snowflake)