Skip to content

PoC: New dataframe read source interface#1864

Draft
Jolanrensen wants to merge 14 commits into
masterfrom
new-DataFrameReadSource
Draft

PoC: New dataframe read source interface#1864
Jolanrensen wants to merge 14 commits into
masterfrom
new-DataFrameReadSource

Conversation

@Jolanrensen
Copy link
Copy Markdown
Collaborator

@Jolanrensen Jolanrensen commented May 18, 2026

#450
@zaleslaw

WIP and proof-of-concept.

Drafting and exploring what a new DataFrame.read() could be and do (together with claude). (named readSource() for now)

In its current state you can give it anything, and it figures out the rest (be that an ArrowReader, a URL, a String, or an Excel sheet). Extra options can be provided when needed.

DataRow.readSource() also works.

It also comes with a DataFrameSchema.readSource(), if you need just the types (overridden by jdbc), and something like CodeString.read() maybe, if you just need the generated interfaces (overridden by openapi-generator).

I'm also thinking about what a unified system like this could bring to the rest of dataframe. It will be very easy, for instance, to hook it into our parsers or converters! Currently the only format we can parse/convert is json Strings->DataFrame, but this could open up any conversion to DataFrame.

I prototyped it in our convert operation, meaning you can convert any supported type to DataRow, DataFrame, or DataFrameSchema now :)

I also tried to implement it for parse, since JSON parsing was already there. This appears to be a bit trickier though. There's a lot of edge-cases, where, for instance, "[a b c]" can successfully be parsed as CSV, causing all sorts of issues later on.
I did manage to make this pass all tests so far though, by making "parsing to dataframe read source" optional (false by default), enabling it only where needed and adding some extra checks for String input of CSV and JSON.

@Jolanrensen Jolanrensen force-pushed the new-DataFrameReadSource branch from e592bd7 to 2696eed Compare May 20, 2026 10:47
@Jolanrensen Jolanrensen force-pushed the new-DataFrameReadSource branch from 2696eed to aa2bd1b Compare May 20, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant