Skip to content

feat: Add RaySource, to_ray_dataset first-class method, docs, and tests#6343

Open
ntkathole wants to merge 1 commit intofeast-dev:masterfrom
ntkathole:ray_source
Open

feat: Add RaySource, to_ray_dataset first-class method, docs, and tests#6343
ntkathole wants to merge 1 commit intofeast-dev:masterfrom
ntkathole:ray_source

Conversation

@ntkathole
Copy link
Copy Markdown
Member

@ntkathole ntkathole commented Apr 28, 2026

What this PR does / why we need it:

Introduces RaySource, a data source descriptor that lets Feast load any Ray Data-readable format (Parquet, CSV, JSON, HuggingFace datasets, MongoDB, binary files, images, TFRecords, WebDataset, SQL, and text) without requiring an intermediate Parquet file. Makes to_ray_dataset() a first-class method on RetrievalJob and FeatureStore. Wires up distributed materialization in the Ray compute engine. Adds reference documentation and unit tests.

Which issue(s) this PR fixes:

Fixes #5568


Open in Devin Review

@ntkathole ntkathole self-assigned this Apr 28, 2026
@ntkathole ntkathole requested review from a team as code owners April 28, 2026 06:48
@ntkathole ntkathole requested review from HaoXuAI, shuchu and tokoko and removed request for a team April 28, 2026 06:48
@ntkathole ntkathole changed the title feat(ray): Add RaySource, to_ray_dataset first-class method, docs, and tests feat: Add RaySource, to_ray_dataset first-class method, docs, and tests Apr 28, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@ntkathole ntkathole force-pushed the ray_source branch 2 times, most recently from 530cd6e to a53d264 Compare April 28, 2026 07:03
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@ntkathole ntkathole force-pushed the ray_source branch 3 times, most recently from f1ddc28 to 9c2079a Compare April 28, 2026 09:57
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Introduces RaySource, a pure-metadata data source descriptor that lets Feast
load any Ray Data-readable format (Parquet, CSV, JSON, HuggingFace datasets,
MongoDB, binary files, images, TFRecords, WebDataset, SQL, and text) without
requiring an intermediate Parquet file. Makes to_ray_dataset() a first-class
method on RetrievalJob and FeatureStore. Wires up distributed materialization
in the Ray compute engine. Adds reference documentation and unit tests.

Signed-off-by: ntkathole <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_ray_dataset as a first citizen method for retrieval_job

1 participant