Skip to content

perf(db): UNLOGGED permission_reference and LZ4 compression on variable-length columns#35187

Merged
wezell merged 4 commits intomainfrom
feature/35186-postgres-storage-optimizations
Apr 3, 2026
Merged

perf(db): UNLOGGED permission_reference and LZ4 compression on variable-length columns#35187
wezell merged 4 commits intomainfrom
feature/35186-postgres-storage-optimizations

Conversation

@wezell
Copy link
Copy Markdown
Member

@wezell wezell commented Apr 3, 2026

Summary

Closes #35186

Two PostgreSQL storage optimizations implemented as startup tasks with corresponding postgres.sql updates for fresh installs.

Hot-rodding the DB

  • UNLOGGED permission_reference table is a cache. It gets flushed and rebuilt when dotCMS caches get flushed. Perfect candidate for unlogged. We are already running this in one of our largest customers environments.
  • LZ4 :
    • Write Performance (INSERT): LZ4 is approximately 80% faster than PGLZ during data insertion. It takes only about 20% of the time PGLZ requires for the same data.
    • Read Performance (SELECT): Query execution times can improve by up to 72% for text-heavy workloads. On average, SELECT statements are about 20% faster with LZ4 compared to PGLZ.
    • Compression Ratio: PGLZ is slightly more efficient at saving space, typically providing a 7% better compression ratio than LZ4.

What was done

  • Convert permission_reference to UNLOGGED — eliminates WAL overhead on a regenerable permission cache (~2–3× faster permission rebuilds)
  • Set LZ4 compression on all 102 text/bytea/jsonb columns — ~3–5× faster TOAST decompression vs pglz default, reduces read latency on content delivery, page rendering, and workflow evaluation

Test plan

  • Start dotCMS against an existing database — verify Task260403SetPermissionReferenceUnlogged runs and pg_class.relpersistence = 'u' for permission_reference
  • Verify Task260403SetLz4CompressionOnTextColumns runs and spot-check pg_attribute.attcompression = 'l' on contentlet.contentlet_as_json and template.body
  • Re-restart — verify both tasks do not re-run (idempotency via forceRun())
  • Fresh install from postgres.sql — verify permission_reference is UNLOGGED and columns show LZ4 compression
  • Verify permission rebuilds still function correctly after UNLOGGED conversion

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 3, 2026

Claude finished @wezell's task in 3m 6s —— View job


Code Review

  • Read all changed files
  • Analyze for issues

3 issues worth flagging:

1. Missing PostgreSQL guard (regression)

The immediately preceding task Task260324AddIdentifierPathTriggerIndex guards with DbConnectionFactory.isPostgres() before running Postgres-specific SQL. Both new tasks use pg_attribute/pg_class/pg_namespace system catalog tables and SET UNLOGGED/SET COMPRESSION lz4 syntax — none of which exist on MySQL or other databases. The commit that removed the guard ("refactor(db): simplify startup tasks — forceRun()=true, remove postgres guard") introduced this regression. If dotCMS still runs on non-Postgres databases, startup will throw SQL errors.

2. UNLOGGED task has no idempotency check

Task260403SetPermissionReferenceUnlogged runs ALTER TABLE permission_reference SET UNLOGGED every startup (because forceRun()=true). PostgreSQL silently accepts this on an already-UNLOGGED table but still acquires an AccessExclusiveLock. The LZ4 task handles this correctly — it queries pg_attribute to find only columns not yet using LZ4 and returns early if none remain. The UNLOGGED task should do the same: query pg_class.relpersistence first and skip the ALTER if already 'u'.

3. Test plan is self-contradictory

"Re-restart — verify both tasks do not re-run (idempotency via forceRun())"

forceRun()=true means the tasks always re-run. Idempotency comes from the SQL logic, not the framework. The LZ4 task is idempotent (SQL filters out already-compressed columns). The UNLOGGED task is not (see #2). The test plan description should say "verify tasks are idempotent (SQL-level check)" rather than implying forceRun() prevents re-execution.


Everything else looks clean: table/column names sourced from pg_catalog so no SQL injection risk, LZ4 per-column SQLException catch degrades gracefully on PG < 14, UNLOGGED semantics on replicas are correctly documented, and registration order in TaskLocatorUtil is correct.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 3, 2026

Claude finished @wezell's task in 4m 58s —— View job


I'll analyze this and get back to you.

wezell and others added 2 commits April 3, 2026 10:26
…up tasks

Task260403SetPermissionReferenceUnlogged:
- Converts permission_reference to UNLOGGED via ALTER TABLE SET UNLOGGED
- Table is a regenerable permission cache; crash-safety is not required
- Eliminates WAL write cost on every permission rebuild INSERT/DELETE
- forceRun() checks pg_class.relpersistence to skip if already unlogged

Task260403SetLz4CompressionOnTextColumns:
- Sets LZ4 compression on all text/bytea/jsonb/json columns in public schema
- Queries pg_attribute dynamically — no hardcoded column list, covers plugins
- Skips columns already using LZ4 (attcompression='l') for idempotency
- Continues on per-column failure so one bad column does not block all others
- Gracefully skips on PostgreSQL < 14 (attcompression column absent)
- LZ4 decompresses 3-5x faster than pglz; affects future writes only

postgres.sql:
- permission_reference CREATE TABLE now uses UNLOGGED
- LZ4 SET COMPRESSION block added for all eligible columns (fresh installs)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds COMPRESSION lz4 directly to all text, bytea, jsonb, and json column
definitions (102 columns across all tables) rather than using ALTER TABLE
statements. Applies to fresh installs; upgrades handled by the startup task.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wezell wezell force-pushed the feature/35186-postgres-storage-optimizations branch from 5b1b567 to f6a05d5 Compare April 3, 2026 14:27
wezell and others added 2 commits April 3, 2026 10:32
…es guard

Both tasks have no side effects and run idempotent DDL, so forceRun() returns
true unconditionally. Postgres-only guard removed since dotCMS runs on postgres.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Added relkind = 'r' filter to restrict pg_attribute scan to regular tables only.
Indexes also have entries in pg_attribute but do not support SET COMPRESSION,
causing the first ALTER to fail and abort the transaction, cascading failures
to all subsequent columns in the same transaction block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@yolabingo
Copy link
Copy Markdown
Member

LZ4 compression was first introduced in PostgreSQL 14

should we check this?

@wezell wezell added this pull request to the merge queue Apr 3, 2026
Merged via the queue into main with commit da62968 Apr 3, 2026
49 checks passed
@wezell wezell deleted the feature/35186-postgres-storage-optimizations branch April 3, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

perf(db): UNLOGGED permission_reference and LZ4 compression on variable-length columns

3 participants