v0.2.0 Roadmap #91

dpxcc · 2025-01-17T19:39:07Z

dpxcc
Jan 17, 2025
Maintainer

The main goal of v0.2.0 is to smooth out rough edges and make pg_mooncake best for specific workloads, particularly time-series workloads that are mostly append-only. Below are the main projects for v0.2.0.

Compatibility (#80)

Yes, we hear the pain. Currently, queries involving columnstore tables are executed entirely in DuckDB, which has feature gaps compared to Postgres that we will never be able to fully bridge.

To fundamentally address this, we need to allow executing part of the query within Postgres. As a starting point, Postgres needs to be able to directly read/write columnstore tables.

This is the architecture we envisioned from the start, but we prioritized executing everything in DuckDB in v0.1.0 for performance. Essentially, pg_mooncake builds a storage layer with data files stored in an object store and metadata managed in Postgres with transactional support. Both Postgres and DuckDB can read from and write to this shared storage layer. At the planner stage, we use DuckDB as a speedy execution engine and push down as much computation as possible to it. Operators that cannot be executed in DuckDB seamlessly fall back to Postgres, ensuring maximum compatibility.

For the initial version in v0.2.0, queries will be executed either entirely in DuckDB or entirely in Postgres. Fine-grained planning will be introduced in a future version.

Small INSERTs (#89)

Small INSERTs into columnstore tables are currently a big pain due to inefficiency. Each query creates a new Parquet file, and each transaction generates a new Delta Lake log record.

To address this, a rowstore table (Postgres heap table) will be used to store small INSERTs. Small INSERTs will go into the rowstore table, while large INSERTs will be written directly to Parquet files. ColumnstoreScan will scan the union of the rowstore table and Parquet files. When a transaction ends, if the rowstore table is too large, it will be flushed. Lakehouse writes will also be decoupled from the transaction lifecycle, with new records/snapshots written only upon request.

Logical replication (#90)

Getting pg_mooncake onto managed Postgres providers like AWS RDS, Google CloudSQL, and Azure Postgres will take time.

In the meantime, enabling logical replication into columnstore tables will allow pg_mooncake to be deployed as a logical replica of the primary Postgres instance, keeping it in sync.

YuweiXiao · 2025-02-06T03:02:31Z

YuweiXiao
Feb 6, 2025

hey, pretty interested in the fine-grained plan execution (i.e., mostly in duckdb and the unsupported part in postgres). Does this mean indexing feature of postgres (e.g., gist) can be combined with duckdb execution? are there any articles on its capability and tech details? One thing i found may be related is motherduck's local & remote execution, which allows join on remote & local dataset. Thanks!

1 reply

dpxcc Feb 6, 2025
Maintainer Author

Yes, enabling index scans in Postgres is a special case of fine-grained plan execution
pg_duckdb used to support this but reverted. The good news is that they are adding it back in v0.3.0 duckdb/pg_duckdb#243. As a result, pg_mooncake, which relies on pg_duckdb to route queries to DuckDB, will support this special case very soon

caesar168 · 2025-03-13T02:15:36Z

caesar168
Mar 13, 2025

need Logical replication,very!very!

3 replies

dpxcc Mar 13, 2025
Maintainer Author

Yes, logical replication is our top priority and we are working hard on it

zjial May 16, 2025

@dpxcc Hi, Is there any recent progress on logical replication?

dpxcc May 16, 2025
Maintainer Author

We got a very early version working and was observing sub-second latency for small INSERT/UPDATE/DELETE
We are currently working for a 0.2 preview version ~early June

caesar168 · 2025-03-13T02:18:17Z

caesar168
Mar 13, 2025

Does pg_mooncake have streaming aggregation and continuous aggregation functions? Similar to Flink and Timescaledb?

4 replies

dpxcc Mar 13, 2025
Maintainer Author

Could you join our Slack channel? We'd love to learn more about your use cases! @paurora17

jtuki Jun 3, 2025

Could you join our Slack channel? We'd love to learn more about your use cases! @paurora17

Hi, cc, how to join slack channel? I find the invite link is invalid currently. @dpxcc

dpxcc Jun 3, 2025
Maintainer Author

Here's the updated Slack invite link:
https://join.slack.com/t/mooncake-devs/shared_invite/zt-36xd7liht-NIIrIGWMLtrZcf38~mNSxg

paurora17 Jun 4, 2025
Maintainer

Hi @jtuki, were you able to join our slack? Alternatively, can you please email us at [email protected]. I can manually add you with your email address.

Thanks!

caesar168 · 2025-03-13T02:29:18Z

caesar168
Mar 13, 2025

Will pghmooncake support logical copying and writing to the Paimon data lake in the future, followed by continuous real-time analysis? Paimon is a more efficient data lake format than Iceberg.

1 reply

fi-ecavc Apr 3, 2025

I'm in China and Slack doesn't work. However, I can provide some small suggestions, when developing the stream replication function, it is best to support the synchronization of one task in the entire database, support the wildcard of table objects, etc.

fi-ecavc · 2025-03-26T07:48:22Z

fi-ecavc
Mar 26, 2025

The paimon data lake format is more performant, and it would be great if it could support CDC to write to paimon in real time!

1 reply

dpxcc Mar 27, 2025
Maintainer Author

For now, we’re focusing on Iceberg because it’s the most popular format today. Based on our experience building HTAP systems over the past decade at our previous company, we believe we can add a thin layer to operationalize Iceberg to support workloads including continuous real-time analysis without the need to switch to a new format

That said, adding support for exporting as a Paimon table is reasonable, and contributions are welcome!

v0.2.0 Roadmap #91

Uh oh!

Uh oh!

dpxcc Jan 17, 2025 Maintainer

Compatibility (#80)

Small INSERTs (#89)

Logical replication (#90)

Replies: 5 comments · 10 replies

Uh oh!

Uh oh!

dpxcc Feb 6, 2025 Maintainer Author

Uh oh!

Uh oh!

dpxcc Mar 13, 2025 Maintainer Author

Uh oh!

Uh oh!

dpxcc May 16, 2025 Maintainer Author

Uh oh!

Uh oh!

dpxcc Mar 13, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

dpxcc Jun 3, 2025 Maintainer Author

Uh oh!

paurora17 Jun 4, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dpxcc Mar 27, 2025 Maintainer Author

dpxcc
Jan 17, 2025
Maintainer

Replies: 5 comments 10 replies

dpxcc Feb 6, 2025
Maintainer Author

dpxcc Mar 13, 2025
Maintainer Author

dpxcc May 16, 2025
Maintainer Author

dpxcc Mar 13, 2025
Maintainer Author

dpxcc Jun 3, 2025
Maintainer Author

paurora17 Jun 4, 2025
Maintainer

dpxcc Mar 27, 2025
Maintainer Author