Replies: 5 comments 10 replies
-
|
hey, pretty interested in the fine-grained plan execution (i.e., mostly in duckdb and the unsupported part in postgres). Does this mean indexing feature of postgres (e.g., gist) can be combined with duckdb execution? are there any articles on its capability and tech details? One thing i found may be related is motherduck's local & remote execution, which allows join on remote & local dataset. Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
need Logical replication,very!very! |
Beta Was this translation helpful? Give feedback.
-
|
Does pg_mooncake have streaming aggregation and continuous aggregation functions? Similar to Flink and Timescaledb? |
Beta Was this translation helpful? Give feedback.
-
|
Will pghmooncake support logical copying and writing to the Paimon data lake in the future, followed by continuous real-time analysis? Paimon is a more efficient data lake format than Iceberg. |
Beta Was this translation helpful? Give feedback.
-
|
The paimon data lake format is more performant, and it would be great if it could support CDC to write to paimon in real time! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The main goal of v0.2.0 is to smooth out rough edges and make pg_mooncake best for specific workloads, particularly time-series workloads that are mostly append-only. Below are the main projects for v0.2.0.
Compatibility (#80)
Yes, we hear the pain. Currently, queries involving columnstore tables are executed entirely in DuckDB, which has feature gaps compared to Postgres that we will never be able to fully bridge.
To fundamentally address this, we need to allow executing part of the query within Postgres. As a starting point, Postgres needs to be able to directly read/write columnstore tables.
This is the architecture we envisioned from the start, but we prioritized executing everything in DuckDB in v0.1.0 for performance. Essentially, pg_mooncake builds a storage layer with data files stored in an object store and metadata managed in Postgres with transactional support. Both Postgres and DuckDB can read from and write to this shared storage layer. At the planner stage, we use DuckDB as a speedy execution engine and push down as much computation as possible to it. Operators that cannot be executed in DuckDB seamlessly fall back to Postgres, ensuring maximum compatibility.
For the initial version in v0.2.0, queries will be executed either entirely in DuckDB or entirely in Postgres. Fine-grained planning will be introduced in a future version.
Small INSERTs (#89)
Small INSERTs into columnstore tables are currently a big pain due to inefficiency. Each query creates a new Parquet file, and each transaction generates a new Delta Lake log record.
To address this, a rowstore table (Postgres heap table) will be used to store small INSERTs. Small INSERTs will go into the rowstore table, while large INSERTs will be written directly to Parquet files. ColumnstoreScan will scan the union of the rowstore table and Parquet files. When a transaction ends, if the rowstore table is too large, it will be flushed. Lakehouse writes will also be decoupled from the transaction lifecycle, with new records/snapshots written only upon request.
Logical replication (#90)
Getting pg_mooncake onto managed Postgres providers like AWS RDS, Google CloudSQL, and Azure Postgres will take time.
In the meantime, enabling logical replication into columnstore tables will allow pg_mooncake to be deployed as a logical replica of the primary Postgres instance, keeping it in sync.
Beta Was this translation helpful? Give feedback.
All reactions