Skip to content

[feature-request] Add a Native UUID Type #16619

@ankitsultana

Description

@ankitsultana

Creating an issue to gauge the community interest.

UUIDs are super-common in Pinot, and I am sure this is not limited to us at Uber. At present users can only use String columns to deal with UUIDs. This means that:

  • Storage: They are stored as 36 bytes in uncompressed file formats.
  • Scans: They are scanned as 36 bytes, and during conversion to String another 36 bytes are allocated for the String's internal byte buffer. While the allocation buffer is usually re-used, the String internal buffer is not. Moreover, it costs additional CPU cycles to copy over the larger byte count.
  • In-Memory Representation: After scanning, UUIDs are passed around as 36 byte values. This adds a memory/cpu overhead to a lot of operations like data shuffles, open-addressed hash table comparisons, etc.

UUIDs in the end are 2 long values and can be represented using only 16 bytes. There are some usability benefits too, but regardless wanted to share this as something we are exploring.

Last year we had released the UUID Hash Function for Upsert Primary Keys and that has been quite useful at Uber in increasing the per-server primary key capacity: #12538

Metadata

Metadata

Assignees

Labels

featureNew functionalityperformanceRelated to performance optimizationstaleNo activity for an extended period

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions